National Centers for Environmental Prediction
/ VISION | About EMC EMC > NAM > EXPERIMENTAL DATA Home NAM Operational Products HIRESW Operational Products Operational Forecast Graphics Experimental Forecast Graphics Verification and Diagnostics Model PARALLEL/EXPERIMENTAL MODEL FORECAST GRAPHICS OPERATIONAL VERIFICATION / DIAGNOSTICS PARALLEL VERIFICATION
National Centers for Environmental Prediction
Products Operational Forecast Graphics Experimental Forecast Graphics Verification and Diagnostics Model PARALLEL/EXPERIMENTAL MODEL FORECAST GRAPHICS OPERATIONAL VERIFICATION / DIAGNOSTICS PARALLEL VERIFICATION Developmental Air Quality Forecasts and Verification Back to Table of Contents 2. PARALLEL/EXPERIMENTAL GRAPHICS
National Centers for Environmental Prediction
Operational Forecast Graphics Experimental Forecast Graphics Verification and Diagnostics Model Configuration /EXPERIMENTAL MODEL FORECAST GRAPHICS OPERATIONAL VERIFICATION / DIAGNOSTICS PARALLEL VERIFICATION / DIAGNOSTICS Developmental Air Quality Forecasts and Verification Back to Table of Contents 2. PARALLEL/EXPERIMENTAL GRAPHICS
Bit-parallel arithmetic in a massively-parallel associative processor
NASA Technical Reports Server (NTRS)
Scherson, Isaac D.; Kramer, David A.; Alleyne, Brian D.
1992-01-01
A simple but powerful new architecture based on a classical associative processor model is presented. Algorithms for performing the four basic arithmetic operations both for integer and floating point operands are described. For m-bit operands, the proposed architecture makes it possible to execute complex operations in O(m) cycles as opposed to O(m exp 2) for bit-serial machines. A word-parallel, bit-parallel, massively-parallel computing system can be constructed using this architecture with VLSI technology. The operation of this system is demonstrated for the fast Fourier transform and matrix multiplication.
Modelling parallel programs and multiprocessor architectures with AXE
NASA Technical Reports Server (NTRS)
Yan, Jerry C.; Fineman, Charles E.
1991-01-01
AXE, An Experimental Environment for Parallel Systems, was designed to model and simulate for parallel systems at the process level. It provides an integrated environment for specifying computation models, multiprocessor architectures, data collection, and performance visualization. AXE is being used at NASA-Ames for developing resource management strategies, parallel problem formulation, multiprocessor architectures, and operating system issues related to the High Performance Computing and Communications Program. AXE's simple, structured user-interface enables the user to model parallel programs and machines precisely and efficiently. Its quick turn-around time keeps the user interested and productive. AXE models multicomputers. The user may easily modify various architectural parameters including the number of sites, connection topologies, and overhead for operating system activities. Parallel computations in AXE are represented as collections of autonomous computing objects known as players. Their use and behavior is described. Performance data of the multiprocessor model can be observed on a color screen. These include CPU and message routing bottlenecks, and the dynamic status of the software.
Cao, Jianfang; Chen, Lichao; Wang, Min; Tian, Yun
2018-01-01
The Canny operator is widely used to detect edges in images. However, as the size of the image dataset increases, the edge detection performance of the Canny operator decreases and its runtime becomes excessive. To improve the runtime and edge detection performance of the Canny operator, in this paper, we propose a parallel design and implementation for an Otsu-optimized Canny operator using a MapReduce parallel programming model that runs on the Hadoop platform. The Otsu algorithm is used to optimize the Canny operator's dual threshold and improve the edge detection performance, while the MapReduce parallel programming model facilitates parallel processing for the Canny operator to solve the processing speed and communication cost problems that occur when the Canny edge detection algorithm is applied to big data. For the experiments, we constructed datasets of different scales from the Pascal VOC2012 image database. The proposed parallel Otsu-Canny edge detection algorithm performs better than other traditional edge detection algorithms. The parallel approach reduced the running time by approximately 67.2% on a Hadoop cluster architecture consisting of 5 nodes with a dataset of 60,000 images. Overall, our approach system speeds up the system by approximately 3.4 times when processing large-scale datasets, which demonstrates the obvious superiority of our method. The proposed algorithm in this study demonstrates both better edge detection performance and improved time performance.
Wang, Min; Tian, Yun
2018-01-01
The Canny operator is widely used to detect edges in images. However, as the size of the image dataset increases, the edge detection performance of the Canny operator decreases and its runtime becomes excessive. To improve the runtime and edge detection performance of the Canny operator, in this paper, we propose a parallel design and implementation for an Otsu-optimized Canny operator using a MapReduce parallel programming model that runs on the Hadoop platform. The Otsu algorithm is used to optimize the Canny operator's dual threshold and improve the edge detection performance, while the MapReduce parallel programming model facilitates parallel processing for the Canny operator to solve the processing speed and communication cost problems that occur when the Canny edge detection algorithm is applied to big data. For the experiments, we constructed datasets of different scales from the Pascal VOC2012 image database. The proposed parallel Otsu-Canny edge detection algorithm performs better than other traditional edge detection algorithms. The parallel approach reduced the running time by approximately 67.2% on a Hadoop cluster architecture consisting of 5 nodes with a dataset of 60,000 images. Overall, our approach system speeds up the system by approximately 3.4 times when processing large-scale datasets, which demonstrates the obvious superiority of our method. The proposed algorithm in this study demonstrates both better edge detection performance and improved time performance. PMID:29861711
PUP: An Architecture to Exploit Parallel Unification in Prolog
1988-03-01
environment stacking mo del similar to the Warren Abstract Machine [23] since it has been shown to be super ior to other known models (see [21]). The storage...execute in groups of independent operations. Unifications belonging to different group s may not overlap. Also unification operations belonging to the...since all parallel operations on the unification units must complete before any of the units can star t executing the next group of parallel
Linearly exact parallel closures for slab geometry
NASA Astrophysics Data System (ADS)
Ji, Jeong-Young; Held, Eric D.; Jhang, Hogun
2013-08-01
Parallel closures are obtained by solving a linearized kinetic equation with a model collision operator using the Fourier transform method. The closures expressed in wave number space are exact for time-dependent linear problems to within the limits of the model collision operator. In the adiabatic, collisionless limit, an inverse Fourier transform is performed to obtain integral (nonlocal) parallel closures in real space; parallel heat flow and viscosity closures for density, temperature, and flow velocity equations replace Braginskii's parallel closure relations, and parallel flow velocity and heat flow closures for density and temperature equations replace Spitzer's parallel transport relations. It is verified that the closures reproduce the exact linear response function of Hammett and Perkins [Phys. Rev. Lett. 64, 3019 (1990)] for Landau damping given a temperature gradient. In contrast to their approximate closures where the vanishing viscosity coefficient numerically gives an exact response, our closures relate the heat flow and nonvanishing viscosity to temperature and flow velocity (gradients).
NASA Astrophysics Data System (ADS)
Bellerby, Tim
2014-05-01
Model Integration System (MIST) is open-source environmental modelling programming language that directly incorporates data parallelism. The language is designed to enable straightforward programming structures, such as nested loops and conditional statements to be directly translated into sequences of whole-array (or more generally whole data-structure) operations. MIST thus enables the programmer to use well-understood constructs, directly relating to the mathematical structure of the model, without having to explicitly vectorize code or worry about details of parallelization. A range of common modelling operations are supported by dedicated language structures operating on cell neighbourhoods rather than individual cells (e.g.: the 3x3 local neighbourhood needed to implement an averaging image filter can be simply accessed from within a simple loop traversing all image pixels). This facility hides details of inter-process communication behind more mathematically relevant descriptions of model dynamics. The MIST automatic vectorization/parallelization process serves both to distribute work among available nodes and separately to control storage requirements for intermediate expressions - enabling operations on very large domains for which memory availability may be an issue. MIST is designed to facilitate efficient interpreter based implementations. A prototype open source interpreter is available, coded in standard FORTRAN 95, with tools to rapidly integrate existing FORTRAN 77 or 95 code libraries. The language is formally specified and thus not limited to FORTRAN implementation or to an interpreter-based approach. A MIST to FORTRAN compiler is under development and volunteers are sought to create an ANSI-C implementation. Parallel processing is currently implemented using OpenMP. However, parallelization code is fully modularised and could be replaced with implementations using other libraries. GPU implementation is potentially possible.
Improving operating room productivity via parallel anesthesia processing.
Brown, Michael J; Subramanian, Arun; Curry, Timothy B; Kor, Daryl J; Moran, Steven L; Rohleder, Thomas R
2014-01-01
Parallel processing of regional anesthesia may improve operating room (OR) efficiency in patients undergoes upper extremity surgical procedures. The purpose of this paper is to evaluate whether performing regional anesthesia outside the OR in parallel increases total cases per day, improve efficiency and productivity. Data from all adult patients who underwent regional anesthesia as their primary anesthetic for upper extremity surgery over a one-year period were used to develop a simulation model. The model evaluated pure operating modes of regional anesthesia performed within and outside the OR in a parallel manner. The scenarios were used to evaluate how many surgeries could be completed in a standard work day (555 minutes) and assuming a standard three cases per day, what was the predicted end-of-day time overtime. Modeling results show that parallel processing of regional anesthesia increases the average cases per day for all surgeons included in the study. The average increase was 0.42 surgeries per day. Where it was assumed that three cases per day would be performed by all surgeons, the days going to overtime was reduced by 43 percent with parallel block. The overtime with parallel anesthesia was also projected to be 40 minutes less per day per surgeon. Key limitations include the assumption that all cases used regional anesthesia in the comparisons. Many days may have both regional and general anesthesia. Also, as a case study, single-center research may limit generalizability. Perioperative care providers should consider parallel administration of regional anesthesia where there is a desire to increase daily upper extremity surgical case capacity. Where there are sufficient resources to do parallel anesthesia processing, efficiency and productivity can be significantly improved. Simulation modeling can be an effective tool to show practice change effects at a system-wide level.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hepburn, I.; De Schutter, E., E-mail: erik@oist.jp; Theoretical Neurobiology & Neuroengineering, University of Antwerp, Antwerp 2610
Spatial stochastic molecular simulations in biology are limited by the intense computation required to track molecules in space either in a discrete time or discrete space framework, which has led to the development of parallel methods that can take advantage of the power of modern supercomputers in recent years. We systematically test suggested components of stochastic reaction-diffusion operator splitting in the literature and discuss their effects on accuracy. We introduce an operator splitting implementation for irregular meshes that enhances accuracy with minimal performance cost. We test a range of models in small-scale MPI simulations from simple diffusion models to realisticmore » biological models and find that multi-dimensional geometry partitioning is an important consideration for optimum performance. We demonstrate performance gains of 1-3 orders of magnitude in the parallel implementation, with peak performance strongly dependent on model specification.« less
Queueing Network Models for Parallel Processing of Task Systems: an Operational Approach
NASA Technical Reports Server (NTRS)
Mak, Victor W. K.
1986-01-01
Computer performance modeling of possibly complex computations running on highly concurrent systems is considered. Earlier works in this area either dealt with a very simple program structure or resulted in methods with exponential complexity. An efficient procedure is developed to compute the performance measures for series-parallel-reducible task systems using queueing network models. The procedure is based on the concept of hierarchical decomposition and a new operational approach. Numerical results for three test cases are presented and compared to those of simulations.
Parallel dispatch: a new paradigm of electrical power system dispatch
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhang, Jun Jason; Wang, Fei-Yue; Wang, Qiang
Modern power systems are evolving into sociotechnical systems with massive complexity, whose real-time operation and dispatch go beyond human capability. Thus, the need for developing and applying new intelligent power system dispatch tools are of great practical significance. In this paper, we introduce the overall business model of power system dispatch, the top level design approach of an intelligent dispatch system, and the parallel intelligent technology with its dispatch applications. We expect that a new dispatch paradigm, namely the parallel dispatch, can be established by incorporating various intelligent technologies, especially the parallel intelligent technology, to enable secure operation of complexmore » power grids, extend system operators U+02BC capabilities, suggest optimal dispatch strategies, and to provide decision-making recommendations according to power system operational goals.« less
NASA Technical Reports Server (NTRS)
Hooey, Becky Lee; Gore, Brian Francis; Mahlstedt, Eric; Foyle, David C.
2013-01-01
The objectives of the current research were to develop valid human performance models (HPMs) of approach and land operations; use these models to evaluate the impact of NextGen Closely Spaced Parallel Operations (CSPO) on pilot performance; and draw conclusions regarding flight deck display design and pilot-ATC roles and responsibilities for NextGen CSPO concepts. This document presents guidelines and implications for flight deck display designs and candidate roles and responsibilities. A companion document (Gore, Hooey, Mahlstedt, & Foyle, 2013) provides complete scenario descriptions and results including predictions of pilot workload, visual attention and time to detect off-nominal events.
Performance Improvements of the CYCOFOS Flow Model
NASA Astrophysics Data System (ADS)
Radhakrishnan, Hari; Moulitsas, Irene; Syrakos, Alexandros; Zodiatis, George; Nikolaides, Andreas; Hayes, Daniel; Georgiou, Georgios C.
2013-04-01
The CYCOFOS-Cyprus Coastal Ocean Forecasting and Observing System has been operational since early 2002, providing daily sea current, temperature, salinity and sea level forecasting data for the next 4 and 10 days to end-users in the Levantine Basin, necessary for operational application in marine safety, particularly concerning oil spills and floating objects predictions. CYCOFOS flow model, similar to most of the coastal and sub-regional operational hydrodynamic forecasting systems of the MONGOOS-Mediterranean Oceanographic Network for Global Ocean Observing System is based on the POM-Princeton Ocean Model. CYCOFOS is nested with the MyOcean Mediterranean regional forecasting data and with SKIRON and ECMWF for surface forcing. The increasing demand for higher and higher resolution data to meet coastal and offshore downstream applications motivated the parallelization of the CYCOFOS POM model. This development was carried out in the frame of the IPcycofos project, funded by the Cyprus Research Promotion Foundation. The parallel processing provides a viable solution to satisfy these demands without sacrificing accuracy or omitting any physical phenomena. Prior to IPcycofos project, there are been several attempts to parallelise the POM, as for example the MP-POM. The existing parallel code models rely on the use of specific outdated hardware architectures and associated software. The objective of the IPcycofos project is to produce an operational parallel version of the CYCOFOS POM code that can replicate the results of the serial version of the POM code used in CYCOFOS. The parallelization of the CYCOFOS POM model use Message Passing Interface-MPI, implemented on commodity computing clusters running open source software and not depending on any specialized vendor hardware. The parallel CYCOFOS POM code constructed in a modular fashion, allowing a fast re-locatable downscaled implementation. The MPI takes advantage of the Cartesian nature of the POM mesh, and use the built-in functionality of MPI routines to split the mesh, using a weighting scheme, along longitude and latitude among the processors. Each server processor work on the model based on domain decomposition techniques. The new parallel CYCOFOS POM code has been benchmarked against the serial POM version of CYCOFOS for speed, accuracy, and resolution and the results are more than satisfactory. With a higher resolution CYCOFOS Levantine model domain the forecasts need much less time than the serial CYCOFOS POM coarser version, both with identical accuracy.
Reliability models for dataflow computer systems
NASA Technical Reports Server (NTRS)
Kavi, K. M.; Buckles, B. P.
1985-01-01
The demands for concurrent operation within a computer system and the representation of parallelism in programming languages have yielded a new form of program representation known as data flow (DENN 74, DENN 75, TREL 82a). A new model based on data flow principles for parallel computations and parallel computer systems is presented. Necessary conditions for liveness and deadlock freeness in data flow graphs are derived. The data flow graph is used as a model to represent asynchronous concurrent computer architectures including data flow computers.
National Centers for Environmental Prediction
Reference List Table of Contents NCEP OPERATIONAL MODEL FORECAST GRAPHICS PARALLEL/EXPERIMENTAL MODEL Developmental Air Quality Forecasts and Verification Back to Table of Contents 2. PARALLEL/EXPERIMENTAL GRAPHICS VERIFICATION (GRID VS.OBS) WEB PAGE (NCEP EXPERIMENTAL PAGE, INTERNAL USE ONLY) Interactive web page tool for
SCaLeM: A Framework for Characterizing and Analyzing Execution Models
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chavarría-Miranda, Daniel; Manzano Franco, Joseph B.; Krishnamoorthy, Sriram
2014-10-13
As scalable parallel systems evolve towards more complex nodes with many-core architectures and larger trans-petascale & upcoming exascale deployments, there is a need to understand, characterize and quantify the underlying execution models being used on such systems. Execution models are a conceptual layer between applications & algorithms and the underlying parallel hardware and systems software on which those applications run. This paper presents the SCaLeM (Synchronization, Concurrency, Locality, Memory) framework for characterizing and execution models. SCaLeM consists of three basic elements: attributes, compositions and mapping of these compositions to abstract parallel systems. The fundamental Synchronization, Concurrency, Locality and Memory attributesmore » are used to characterize each execution model, while the combinations of those attributes in the form of compositions are used to describe the primitive operations of the execution model. The mapping of the execution model’s primitive operations described by compositions, to an underlying abstract parallel system can be evaluated quantitatively to determine its effectiveness. Finally, SCaLeM also enables the representation and analysis of applications in terms of execution models, for the purpose of evaluating the effectiveness of such mapping.« less
Karasick, Michael S.; Strip, David R.
1996-01-01
A parallel computing system is described that comprises a plurality of uniquely labeled, parallel processors, each processor capable of modelling a three-dimensional object that includes a plurality of vertices, faces and edges. The system comprises a front-end processor for issuing a modelling command to the parallel processors, relating to a three-dimensional object. Each parallel processor, in response to the command and through the use of its own unique label, creates a directed-edge (d-edge) data structure that uniquely relates an edge of the three-dimensional object to one face of the object. Each d-edge data structure at least includes vertex descriptions of the edge and a description of the one face. As a result, each processor, in response to the modelling command, operates upon a small component of the model and generates results, in parallel with all other processors, without the need for processor-to-processor intercommunication.
NASA Astrophysics Data System (ADS)
Coudarcher, Rémi; Duculty, Florent; Serot, Jocelyn; Jurie, Frédéric; Derutin, Jean-Pierre; Dhome, Michel
2005-12-01
SKiPPER is a SKeleton-based Parallel Programming EnviRonment being developed since 1996 and running at LASMEA Laboratory, the Blaise-Pascal University, France. The main goal of the project was to demonstrate the applicability of skeleton-based parallel programming techniques to the fast prototyping of reactive vision applications. This paper deals with the special features embedded in the latest version of the project: algorithmic skeleton nesting capabilities and a fully dynamic operating model. Throughout the case study of a complete and realistic image processing application, in which we have pointed out the requirement for skeleton nesting, we are presenting the operating model of this feature. The work described here is one of the few reported experiments showing the application of skeleton nesting facilities for the parallelisation of a realistic application, especially in the area of image processing. The image processing application we have chosen is a 3D face-tracking algorithm from appearance.
Parallel Reconstruction Using Null Operations (PRUNO)
Zhang, Jian; Liu, Chunlei; Moseley, Michael E.
2011-01-01
A novel iterative k-space data-driven technique, namely Parallel Reconstruction Using Null Operations (PRUNO), is presented for parallel imaging reconstruction. In PRUNO, both data calibration and image reconstruction are formulated into linear algebra problems based on a generalized system model. An optimal data calibration strategy is demonstrated by using Singular Value Decomposition (SVD). And an iterative conjugate- gradient approach is proposed to efficiently solve missing k-space samples during reconstruction. With its generalized formulation and precise mathematical model, PRUNO reconstruction yields good accuracy, flexibility, stability. Both computer simulation and in vivo studies have shown that PRUNO produces much better reconstruction quality than autocalibrating partially parallel acquisition (GRAPPA), especially under high accelerating rates. With the aid of PRUO reconstruction, ultra high accelerating parallel imaging can be performed with decent image quality. For example, we have done successful PRUNO reconstruction at a reduction factor of 6 (effective factor of 4.44) with 8 coils and only a few autocalibration signal (ACS) lines. PMID:21604290
The Extended Parallel Process Model: Illuminating the Gaps in Research
ERIC Educational Resources Information Center
Popova, Lucy
2012-01-01
This article examines constructs, propositions, and assumptions of the extended parallel process model (EPPM). Review of the EPPM literature reveals that its theoretical concepts are thoroughly developed, but the theory lacks consistency in operational definitions of some of its constructs. Out of the 12 propositions of the EPPM, a few have not…
Parallelization and automatic data distribution for nuclear reactor simulations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Liebrock, L.M.
1997-07-01
Detailed attempts at realistic nuclear reactor simulations currently take many times real time to execute on high performance workstations. Even the fastest sequential machine can not run these simulations fast enough to ensure that the best corrective measure is used during a nuclear accident to prevent a minor malfunction from becoming a major catastrophe. Since sequential computers have nearly reached the speed of light barrier, these simulations will have to be run in parallel to make significant improvements in speed. In physical reactor plants, parallelism abounds. Fluids flow, controls change, and reactions occur in parallel with only adjacent components directlymore » affecting each other. These do not occur in the sequentialized manner, with global instantaneous effects, that is often used in simulators. Development of parallel algorithms that more closely approximate the real-world operation of a reactor may, in addition to speeding up the simulations, actually improve the accuracy and reliability of the predictions generated. Three types of parallel architecture (shared memory machines, distributed memory multicomputers, and distributed networks) are briefly reviewed as targets for parallelization of nuclear reactor simulation. Various parallelization models (loop-based model, shared memory model, functional model, data parallel model, and a combined functional and data parallel model) are discussed along with their advantages and disadvantages for nuclear reactor simulation. A variety of tools are introduced for each of the models. Emphasis is placed on the data parallel model as the primary focus for two-phase flow simulation. Tools to support data parallel programming for multiple component applications and special parallelization considerations are also discussed.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hirata, So
2003-11-20
We develop a symbolic manipulation program and program generator (Tensor Contraction Engine or TCE) that automatically derives the working equations of a well-defined model of second-quantized many-electron theories and synthesizes efficient parallel computer programs on the basis of these equations. Provided an ansatz of a many-electron theory model, TCE performs valid contractions of creation and annihilation operators according to Wick's theorem, consolidates identical terms, and reduces the expressions into the form of multiple tensor contractions acted by permutation operators. Subsequently, it determines the binary contraction order for each multiple tensor contraction with the minimal operation and memory cost, factorizes commonmore » binary contractions (defines intermediate tensors), and identifies reusable intermediates. The resulting ordered list of binary tensor contractions, additions, and index permutations is translated into an optimized program that is combined with the NWChem and UTChem computational chemistry software packages. The programs synthesized by TCE take advantage of spin symmetry, Abelian point-group symmetry, and index permutation symmetry at every stage of calculations to minimize the number of arithmetic operations and storage requirement, adjust the peak local memory usage by index range tiling, and support parallel I/O interfaces and dynamic load balancing for parallel executions. We demonstrate the utility of TCE through automatic derivation and implementation of parallel programs for various models of configuration-interaction theory (CISD, CISDT, CISDTQ), many-body perturbation theory [MBPT(2), MBPT(3), MBPT(4)], and coupled-cluster theory (LCCD, CCD, LCCSD, CCSD, QCISD, CCSDT, and CCSDTQ).« less
Research on Parallel Three Phase PWM Converters base on RTDS
NASA Astrophysics Data System (ADS)
Xia, Yan; Zou, Jianxiao; Li, Kai; Liu, Jingbo; Tian, Jun
2018-01-01
Converters parallel operation can increase capacity of the system, but it may lead to potential zero-sequence circulating current, so the control of circulating current was an important goal in the design of parallel inverters. In this paper, the Real Time Digital Simulator (RTDS) is used to model the converters parallel system in real time and study the circulating current restraining. The equivalent model of two parallel converters and zero-sequence circulating current(ZSCC) were established and analyzed, then a strategy using variable zero vector control was proposed to suppress the circulating current. For two parallel modular converters, hardware-in-the-loop(HIL) study based on RTDS and practical experiment were implemented, results prove that the proposed control strategy is feasible and effective.
Synchronization Of Parallel Discrete Event Simulations
NASA Technical Reports Server (NTRS)
Steinman, Jeffrey S.
1992-01-01
Adaptive, parallel, discrete-event-simulation-synchronization algorithm, Breathing Time Buckets, developed in Synchronous Parallel Environment for Emulation and Discrete Event Simulation (SPEEDES) operating system. Algorithm allows parallel simulations to process events optimistically in fluctuating time cycles that naturally adapt while simulation in progress. Combines best of optimistic and conservative synchronization strategies while avoiding major disadvantages. Algorithm processes events optimistically in time cycles adapting while simulation in progress. Well suited for modeling communication networks, for large-scale war games, for simulated flights of aircraft, for simulations of computer equipment, for mathematical modeling, for interactive engineering simulations, and for depictions of flows of information.
User's guide to the Parallel Processing Extension of the Prognosis Model
Nicholas L. Crookston; Albert R. Stage
1991-01-01
The Parallel Processing Extension (PPE) of the Prognosis Model was designed to analyze responses of numerous stands to coordinated management and pest impacts that operate at the landscape level of forests. Vegetation-related resource supply analysis can be readily performed for a thousand or more sample stands for projections 400 years into the future. Capabilities...
National Centers for Environmental Prediction
/ VISION | About EMC EMC > NAM > Home NAM Operational Products HIRESW Operational Products Operational Forecast Graphics Experimental Forecast Graphics Verification and Diagnostics Model Configuration Collaborators Documentation and Code FAQ Operational Change Log Parallel Experiment Change Log Contacts
Jones, Ryan J. R.; Shinde, Aniketa; Guevarra, Dan; ...
2015-01-05
There are many energy technologies require electrochemical stability or preactivation of functional materials. Due to the long experiment duration required for either electrochemical preactivation or evaluation of operational stability, parallel screening is required to enable high throughput experimentation. We found that imposing operational electrochemical conditions to a library of materials in parallel creates several opportunities for experimental artifacts. We discuss the electrochemical engineering principles and operational parameters that mitigate artifacts int he parallel electrochemical treatment system. We also demonstrate the effects of resistive losses within the planar working electrode through a combination of finite element modeling and illustrative experiments. Operationmore » of the parallel-plate, membrane-separated electrochemical treatment system is demonstrated by exposing a composition library of mixed metal oxides to oxygen evolution conditions in 1M sulfuric acid for 2h. This application is particularly important because the electrolysis and photoelectrolysis of water are promising future energy technologies inhibited by the lack of highly active, acid-stable catalysts containing only earth abundant elements.« less
NASA Astrophysics Data System (ADS)
Sun, Degui; Wang, Na-Xin; He, Li-Ming; Weng, Zhao-Heng; Wang, Daheng; Chen, Ray T.
1996-06-01
A space-position-logic-encoding scheme is proposed and demonstrated. This encoding scheme not only makes the best use of the convenience of binary logic operation, but is also suitable for the trinary property of modified signed- digit (MSD) numbers. Based on the space-position-logic-encoding scheme, a fully parallel modified signed-digit adder and subtractor is built using optoelectronic switch technologies in conjunction with fiber-multistage 3D optoelectronic interconnects. Thus an effective combination of a parallel algorithm and a parallel architecture is implemented. In addition, the performance of the optoelectronic switches used in this system is experimentally studied and verified. Both the 3-bit experimental model and the experimental results of a parallel addition and a parallel subtraction are provided and discussed. Finally, the speed ratio between the MSD adder and binary adders is discussed and the advantage of the MSD in operating speed is demonstrated.
Karasick, M.S.; Strip, D.R.
1996-01-30
A parallel computing system is described that comprises a plurality of uniquely labeled, parallel processors, each processor capable of modeling a three-dimensional object that includes a plurality of vertices, faces and edges. The system comprises a front-end processor for issuing a modeling command to the parallel processors, relating to a three-dimensional object. Each parallel processor, in response to the command and through the use of its own unique label, creates a directed-edge (d-edge) data structure that uniquely relates an edge of the three-dimensional object to one face of the object. Each d-edge data structure at least includes vertex descriptions of the edge and a description of the one face. As a result, each processor, in response to the modeling command, operates upon a small component of the model and generates results, in parallel with all other processors, without the need for processor-to-processor intercommunication. 8 figs.
Parallel Algorithms for Switching Edges in Heterogeneous Graphs.
Bhuiyan, Hasanuzzaman; Khan, Maleq; Chen, Jiangzhuo; Marathe, Madhav
2017-06-01
An edge switch is an operation on a graph (or network) where two edges are selected randomly and one of their end vertices are swapped with each other. Edge switch operations have important applications in graph theory and network analysis, such as in generating random networks with a given degree sequence, modeling and analyzing dynamic networks, and in studying various dynamic phenomena over a network. The recent growth of real-world networks motivates the need for efficient parallel algorithms. The dependencies among successive edge switch operations and the requirement to keep the graph simple (i.e., no self-loops or parallel edges) as the edges are switched lead to significant challenges in designing a parallel algorithm. Addressing these challenges requires complex synchronization and communication among the processors leading to difficulties in achieving a good speedup by parallelization. In this paper, we present distributed memory parallel algorithms for switching edges in massive networks. These algorithms provide good speedup and scale well to a large number of processors. A harmonic mean speedup of 73.25 is achieved on eight different networks with 1024 processors. One of the steps in our edge switch algorithms requires the computation of multinomial random variables in parallel. This paper presents the first non-trivial parallel algorithm for the problem, achieving a speedup of 925 using 1024 processors.
Parallel Algorithms for Switching Edges in Heterogeneous Graphs☆
Khan, Maleq; Chen, Jiangzhuo; Marathe, Madhav
2017-01-01
An edge switch is an operation on a graph (or network) where two edges are selected randomly and one of their end vertices are swapped with each other. Edge switch operations have important applications in graph theory and network analysis, such as in generating random networks with a given degree sequence, modeling and analyzing dynamic networks, and in studying various dynamic phenomena over a network. The recent growth of real-world networks motivates the need for efficient parallel algorithms. The dependencies among successive edge switch operations and the requirement to keep the graph simple (i.e., no self-loops or parallel edges) as the edges are switched lead to significant challenges in designing a parallel algorithm. Addressing these challenges requires complex synchronization and communication among the processors leading to difficulties in achieving a good speedup by parallelization. In this paper, we present distributed memory parallel algorithms for switching edges in massive networks. These algorithms provide good speedup and scale well to a large number of processors. A harmonic mean speedup of 73.25 is achieved on eight different networks with 1024 processors. One of the steps in our edge switch algorithms requires the computation of multinomial random variables in parallel. This paper presents the first non-trivial parallel algorithm for the problem, achieving a speedup of 925 using 1024 processors. PMID:28757680
1985-05-01
unit in the data base, with knowing one generic assembly language. °-’--a 139 The 5-tuple describing single operation execution time of the operations...TSi-- generate , random eventi ( ,.0-15 tieit tmls - ((floa egus ()16 274 r Ispt imet imel I at :EVE’JS- II ktime=0.0; /0 present time 0/ rrs ptime=0.0...computing machinery capable of performing these tasks within a given time constraint. Because the majority of the available computing machinery is general
DOE Office of Scientific and Technical Information (OSTI.GOV)
Vinogradov, A. Yu., E-mail: vinogradov-a@ntcees.ru; Gerasimov, A. S.; Kozlov, A. V.
Consideration is given to different approaches to modeling the control systems of gas turbines as a component of CCPP and GTPP to ensure their reliable parallel operation in the UPS of Russia. The disadvantages of the approaches to the modeling of combined-cycle units in studying long-term electromechanical transients accompanied by power imbalance are pointed out. Examples are presented to support the use of more detailed models of gas turbines in electromechanical transient calculations. It is shown that the modern speed control systems of gas turbines in combination with relatively low equivalent inertia have a considerable effect on electromechanical transients, includingmore » those caused by disturbances not related to power imbalance.« less
Xyce Parallel Electronic Simulator : users' guide, version 2.0.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hoekstra, Robert John; Waters, Lon J.; Rankin, Eric Lamont
2004-06-01
This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator capable of simulating electrical circuits at a variety of abstraction levels. Primarily, Xyce has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability the current state-of-the-art in the following areas: {sm_bullet} Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). Note that this includes support for most popular parallel and serial computers. {sm_bullet} Improved performance for allmore » numerical kernels (e.g., time integrator, nonlinear and linear solvers) through state-of-the-art algorithms and novel techniques. {sm_bullet} Device models which are specifically tailored to meet Sandia's needs, including many radiation-aware devices. {sm_bullet} A client-server or multi-tiered operating model wherein the numerical kernel can operate independently of the graphical user interface (GUI). {sm_bullet} Object-oriented code design and implementation using modern coding practices that ensure that the Xyce Parallel Electronic Simulator will be maintainable and extensible far into the future. Xyce is a parallel code in the most general sense of the phrase - a message passing of computing platforms. These include serial, shared-memory and distributed-memory parallel implementation - which allows it to run efficiently on the widest possible number parallel as well as heterogeneous platforms. Careful attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. One feature required by designers is the ability to add device models, many specific to the needs of Sandia, to the code. To this end, the device package in the Xyce These input formats include standard analytical models, behavioral models look-up Parallel Electronic Simulator is designed to support a variety of device model inputs. tables, and mesh-level PDE device models. Combined with this flexible interface is an architectural design that greatly simplifies the addition of circuit models. One of the most important feature of Xyce is in providing a platform for computational research and development aimed specifically at the needs of the Laboratory. With Xyce, Sandia now has an 'in-house' capability with which both new electrical (e.g., device model development) and algorithmic (e.g., faster time-integration methods) research and development can be performed. Ultimately, these capabilities are migrated to end users.« less
A hybrid parallel framework for the cellular Potts model simulations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jiang, Yi; He, Kejing; Dong, Shoubin
2009-01-01
The Cellular Potts Model (CPM) has been widely used for biological simulations. However, most current implementations are either sequential or approximated, which can't be used for large scale complex 3D simulation. In this paper we present a hybrid parallel framework for CPM simulations. The time-consuming POE solving, cell division, and cell reaction operation are distributed to clusters using the Message Passing Interface (MPI). The Monte Carlo lattice update is parallelized on shared-memory SMP system using OpenMP. Because the Monte Carlo lattice update is much faster than the POE solving and SMP systems are more and more common, this hybrid approachmore » achieves good performance and high accuracy at the same time. Based on the parallel Cellular Potts Model, we studied the avascular tumor growth using a multiscale model. The application and performance analysis show that the hybrid parallel framework is quite efficient. The hybrid parallel CPM can be used for the large scale simulation ({approx}10{sup 8} sites) of complex collective behavior of numerous cells ({approx}10{sup 6}).« less
A Pervasive Parallel Processing Framework for Data Visualization and Analysis at Extreme Scale
DOE Office of Scientific and Technical Information (OSTI.GOV)
Moreland, Kenneth; Geveci, Berk
2014-11-01
The evolution of the computing world from teraflop to petaflop has been relatively effortless, with several of the existing programming models scaling effectively to the petascale. The migration to exascale, however, poses considerable challenges. All industry trends infer that the exascale machine will be built using processors containing hundreds to thousands of cores per chip. It can be inferred that efficient concurrency on exascale machines requires a massive amount of concurrent threads, each performing many operations on a localized piece of data. Currently, visualization libraries and applications are based off what is known as the visualization pipeline. In the pipelinemore » model, algorithms are encapsulated as filters with inputs and outputs. These filters are connected by setting the output of one component to the input of another. Parallelism in the visualization pipeline is achieved by replicating the pipeline for each processing thread. This works well for today’s distributed memory parallel computers but cannot be sustained when operating on processors with thousands of cores. Our project investigates a new visualization framework designed to exhibit the pervasive parallelism necessary for extreme scale machines. Our framework achieves this by defining algorithms in terms of worklets, which are localized stateless operations. Worklets are atomic operations that execute when invoked unlike filters, which execute when a pipeline request occurs. The worklet design allows execution on a massive amount of lightweight threads with minimal overhead. Only with such fine-grained parallelism can we hope to fill the billions of threads we expect will be necessary for efficient computation on an exascale machine.« less
Temperature Control with Two Parallel Small Loop Heat Pipes for GLM Program
NASA Technical Reports Server (NTRS)
Khrustalev, Dmitry; Stouffer, Chuck; Ku, Jentung; Hamilton, Jon; Anderson, Mark
2014-01-01
The concept of temperature control of an electronic component using a single Loop Heat Pipe (LHP) is well established for Aerospace applications. Using two LHPs is often desirable for redundancy/reliability reasons or for increasing the overall heat source-sink thermal conductance. This effort elaborates on temperature controlling operation of a thermal system that includes two small ammonia LHPs thermally coupled together at the evaporator end as well as at the condenser end and operating "in parallel". A transient model of the LHP system was developed on the Thermal Desktop (TradeMark) platform to understand some fundamental details of such parallel operation of the two LHPs. Extensive thermal-vacuum testing was conducted with two thermally coupled LHPs operating simultaneously as well as with only one LHP operating at a time. This paper outlines the temperature control procedures for two LHPs operating simultaneously with widely varying sink temperatures. The test data obtained during the thermal-vacuum testing, with both LHPs running simultaneously in comparison with only one LHP operating at a time, are presented with detailed explanations.
Modeling and controller design of a 6-DOF precision positioning system
NASA Astrophysics Data System (ADS)
Cai, Kunhai; Tian, Yanling; Liu, Xianping; Fatikow, Sergej; Wang, Fujun; Cui, Liangyu; Zhang, Dawei; Shirinzadeh, Bijan
2018-05-01
A key hurdle to meet the needs of micro/nano manipulation in some complex cases is the inadequate workspace and flexibility of the operation ends. This paper presents a 6-degree of freedom (DOF) serial-parallel precision positioning system, which consists of two compact type 3-DOF parallel mechanisms. Each parallel mechanism is driven by three piezoelectric actuators (PEAs), guided by three symmetric T-shape hinges and three elliptical flexible hinges, respectively. It can extend workspace and improve flexibility of the operation ends. The proposed system can be assembled easily, which will greatly reduce the assembly errors and improve the positioning accuracy. In addition, the kinematic and dynamic model of the 6-DOF system are established, respectively. Furthermore, in order to reduce the tracking error and improve the positioning accuracy, the Discrete-time Model Predictive Controller (DMPC) is applied as an effective control method. Meanwhile, the effectiveness of the DMCP control method is verified. Finally, the tracking experiment is performed to verify the tracking performances of the 6-DOF stage.
Retargeting of existing FORTRAN program and development of parallel compilers
NASA Technical Reports Server (NTRS)
Agrawal, Dharma P.
1988-01-01
The software models used in implementing the parallelizing compiler for the B-HIVE multiprocessor system are described. The various models and strategies used in the compiler development are: flexible granularity model, which allows a compromise between two extreme granularity models; communication model, which is capable of precisely describing the interprocessor communication timings and patterns; loop type detection strategy, which identifies different types of loops; critical path with coloring scheme, which is a versatile scheduling strategy for any multicomputer with some associated communication costs; and loop allocation strategy, which realizes optimum overlapped operations between computation and communication of the system. Using these models, several sample routines of the AIR3D package are examined and tested. It may be noted that automatically generated codes are highly parallelized to provide the maximized degree of parallelism, obtaining the speedup up to a 28 to 32-processor system. A comparison of parallel codes for both the existing and proposed communication model, is performed and the corresponding expected speedup factors are obtained. The experimentation shows that the B-HIVE compiler produces more efficient codes than existing techniques. Work is progressing well in completing the final phase of the compiler. Numerous enhancements are needed to improve the capabilities of the parallelizing compiler.
NASA Astrophysics Data System (ADS)
Ganesan, Nandhini; Basu, Suman; Hariharan, Krishnan S.; Kolake, Subramanya Mayya; Song, Taewon; Yeo, Taejung; Sohn, Dong Kee; Doo, Seokgwang
2016-08-01
Lithium-Ion batteries used for electric vehicle applications are subject to large currents and various operation conditions, making battery pack design and life extension a challenging problem. With increase in complexity, modeling and simulation can lead to insights that ensure optimal performance and life extension. In this manuscript, an electrochemical-thermal (ECT) coupled model for a 6 series × 5 parallel pack is developed for Li ion cells with NCA/C electrodes and validated against experimental data. Contribution of the cathode to overall degradation at various operating conditions is assessed. Pack asymmetry is analyzed from a design and an operational perspective. Design based asymmetry leads to a new approach of obtaining the individual cell responses of the pack from an average ECT output. Operational asymmetry is demonstrated in terms of effects of thermal gradients on cycle life, and an efficient model predictive control technique is developed. Concept of reconfigurable battery pack is studied using detailed simulations that can be used for effective monitoring and extension of battery pack life.
PCTDSE: A parallel Cartesian-grid-based TDSE solver for modeling laser-atom interactions
NASA Astrophysics Data System (ADS)
Fu, Yongsheng; Zeng, Jiaolong; Yuan, Jianmin
2017-01-01
We present a parallel Cartesian-grid-based time-dependent Schrödinger equation (TDSE) solver for modeling laser-atom interactions. It can simulate the single-electron dynamics of atoms in arbitrary time-dependent vector potentials. We use a split-operator method combined with fast Fourier transforms (FFT), on a three-dimensional (3D) Cartesian grid. Parallelization is realized using a 2D decomposition strategy based on the Message Passing Interface (MPI) library, which results in a good parallel scaling on modern supercomputers. We give simple applications for the hydrogen atom using the benchmark problems coming from the references and obtain repeatable results. The extensions to other laser-atom systems are straightforward with minimal modifications of the source code.
NASA Astrophysics Data System (ADS)
Tolson, B.; Matott, L. S.; Gaffoor, T. A.; Asadzadeh, M.; Shafii, M.; Pomorski, P.; Xu, X.; Jahanpour, M.; Razavi, S.; Haghnegahdar, A.; Craig, J. R.
2015-12-01
We introduce asynchronous parallel implementations of the Dynamically Dimensioned Search (DDS) family of algorithms including DDS, discrete DDS, PA-DDS and DDS-AU. These parallel algorithms are unique from most existing parallel optimization algorithms in the water resources field in that parallel DDS is asynchronous and does not require an entire population (set of candidate solutions) to be evaluated before generating and then sending a new candidate solution for evaluation. One key advance in this study is developing the first parallel PA-DDS multi-objective optimization algorithm. The other key advance is enhancing the computational efficiency of solving optimization problems (such as model calibration) by combining a parallel optimization algorithm with the deterministic model pre-emption concept. These two efficiency techniques can only be combined because of the asynchronous nature of parallel DDS. Model pre-emption functions to terminate simulation model runs early, prior to completely simulating the model calibration period for example, when intermediate results indicate the candidate solution is so poor that it will definitely have no influence on the generation of further candidate solutions. The computational savings of deterministic model preemption available in serial implementations of population-based algorithms (e.g., PSO) disappear in synchronous parallel implementations as these algorithms. In addition to the key advances above, we implement the algorithms across a range of computation platforms (Windows and Unix-based operating systems from multi-core desktops to a supercomputer system) and package these for future modellers within a model-independent calibration software package called Ostrich as well as MATLAB versions. Results across multiple platforms and multiple case studies (from 4 to 64 processors) demonstrate the vast improvement over serial DDS-based algorithms and highlight the important role model pre-emption plays in the performance of parallel, pre-emptable DDS algorithms. Case studies include single- and multiple-objective optimization problems in water resources model calibration and in many cases linear or near linear speedups are observed.
Bumči, Igor; Vlahović, Tomislav; Jurić, Filip; Žganjer, Mirko; Miličić, Gordana; Wolf, Hinko; Antabak, Anko
2015-11-01
Paediatric ankle fractures comprise approximately 4% of all paediatric fractures and 30% of all epiphyseal fractures. Integrity of the ankle "mortise", which consists of tibial and fibular malleoli, is significant for stability and function of the ankle joint. Tibial malleolar fractures are classified as SH III or SH IV intra-articular fractures and, in cases where the fragments are displaced, anatomic reposition and fixation is mandatory. Type SH III-IV fractures of the tibial malleolus are usually treated with open reduction and fixation with cannulated screws that are parallel to the physis. Two K-wires are used for temporary stabilisation of fragments during reduction. A third "guide wire" for the screw is then placed parallel with the physis. Considering the rules of mechanics, it is assumed that the two temporary pins with the additional third pin placed parallel to the physis create a strong triangle and thus provide strong fracture fixation. To prove this hypothesis, an experiment was conducted on the artificial models of the lower end of the tibia from the company "Sawbones". Each model had been sawn in a way that imitates the fracture of medial malleoli and then reattached with 1.8mm pins in various combinations. Prepared models were then tested for tensile and pressure forces. The least stable model was that in which the fractured pieces were attached with only two parallel pins. The most stable model comprised three pins, where two crossed pins were inserted in the opposite compact bone and the third pin was inserted through the epiphysis parallel with and below the growth plate. A potential method of choice for fixation of tibial malleolar fractures comprises three K-wires, where two crossed pins are placed in the opposite compact bone and one is parallel with the growth plate. The benefits associated with this method include shorter operating times and avoidance of a second operation for screw removal. Copyright © 2015 Elsevier Ltd. All rights reserved.
Model-Based Systems Engineering in the Execution of Search and Rescue Operations
2015-09-01
OSC can fulfill the duties of an ACO but it may make sense to split the duties if there are no communication links between the OSC and participating...parallel mode. This mode is the most powerful option because it 35 creates sequence diagrams that generate parallel “ swim lanes” for each asset...greater flexibility is desired, sequence mode generates diagrams based purely on sequential action and activity diagrams without the parallel “ swim lanes
Hierarchical analytical and simulation modelling of human-machine systems with interference
NASA Astrophysics Data System (ADS)
Braginsky, M. Ya; Tarakanov, D. V.; Tsapko, S. G.; Tsapko, I. V.; Baglaeva, E. A.
2017-01-01
The article considers the principles of building the analytical and simulation model of the human operator and the industrial control system hardware and software. E-networks as the extension of Petri nets are used as the mathematical apparatus. This approach allows simulating complex parallel distributed processes in human-machine systems. The structural and hierarchical approach is used as the building method for the mathematical model of the human operator. The upper level of the human operator is represented by the logical dynamic model of decision making based on E-networks. The lower level reflects psychophysiological characteristics of the human-operator.
Longitudinal train dynamics: an overview
NASA Astrophysics Data System (ADS)
Wu, Qing; Spiryagin, Maksym; Cole, Colin
2016-12-01
This paper discusses the evolution of longitudinal train dynamics (LTD) simulations, which covers numerical solvers, vehicle connection systems, air brake systems, wagon dumper systems and locomotives, resistance forces and gravitational components, vehicle in-train instabilities, and computing schemes. A number of potential research topics are suggested, such as modelling of friction, polymer, and transition characteristics for vehicle connection simulations, studies of wagon dumping operations, proper modelling of vehicle in-train instabilities, and computing schemes for LTD simulations. Evidence shows that LTD simulations have evolved with computing capabilities. Currently, advanced component models that directly describe the working principles of the operation of air brake systems, vehicle connection systems, and traction systems are available. Parallel computing is a good solution to combine and simulate all these advanced models. Parallel computing can also be used to conduct three-dimensional long train dynamics simulations.
Comparison between four dissimilar solar panel configurations
NASA Astrophysics Data System (ADS)
Suleiman, K.; Ali, U. A.; Yusuf, Ibrahim; Koko, A. D.; Bala, S. I.
2017-12-01
Several studies on photovoltaic systems focused on how it operates and energy required in operating it. Little attention is paid on its configurations, modeling of mean time to system failure, availability, cost benefit and comparisons of parallel and series-parallel designs. In this research work, four system configurations were studied. Configuration I consists of two sub-components arranged in parallel with 24 V each, configuration II consists of four sub-components arranged logically in parallel with 12 V each, configuration III consists of four sub-components arranged in series-parallel with 8 V each, and configuration IV has six sub-components with 6 V each arranged in series-parallel. Comparative analysis was made using Chapman Kolmogorov's method. The derivation for explicit expression of mean time to system failure, steady state availability and cost benefit analysis were performed, based on the comparison. Ranking method was used to determine the optimal configuration of the systems. The results of analytical and numerical solutions of system availability and mean time to system failure were determined and it was found that configuration I is the optimal configuration.
A novel visual hardware behavioral language
NASA Technical Reports Server (NTRS)
Li, Xueqin; Cheng, H. D.
1992-01-01
Most hardware behavioral languages just use texts to describe the behavior of the desired hardware design. This is inconvenient for VLSI designers who enjoy using the schematic approach. The proposed visual hardware behavioral language has the ability to graphically express design information using visual parallel models (blocks), visual sequential models (processes) and visual data flow graphs (which consist of primitive operational icons, control icons, and Data and Synchro links). Thus, the proposed visual hardware behavioral language can not only specify hardware concurrent and sequential functionality, but can also visually expose parallelism, sequentiality, and disjointness (mutually exclusive operations) for the hardware designers. That would make the hardware designers capture the design ideas easily and explicitly using this visual hardware behavioral language.
Chen, Diane; Drabick, Deborah A G; Burgers, Darcy E
2015-12-01
Peer rejection and deviant peer affiliation are linked consistently to the development and maintenance of conduct problems. Two proposed models may account for longitudinal relations among these peer processes and conduct problems: the (a) sequential mediation model, in which peer rejection in childhood and deviant peer affiliation in adolescence mediate the link between early externalizing behaviors and more serious adolescent conduct problems; and (b) parallel process model, in which peer rejection and deviant peer affiliation are considered independent processes that operate simultaneously to increment risk for conduct problems. In this review, we evaluate theoretical models and evidence for associations among conduct problems and (a) peer rejection and (b) deviant peer affiliation. We then consider support for the sequential mediation and parallel process models. Next, we propose an integrated model incorporating both the sequential mediation and parallel process models. Future research directions and implications for prevention and intervention efforts are discussed.
Chen, Diane; Drabick, Deborah A. G.; Burgers, Darcy E.
2015-01-01
Peer rejection and deviant peer affiliation are linked consistently to the development and maintenance of conduct problems. Two proposed models may account for longitudinal relations among these peer processes and conduct problems: the (a) sequential mediation model, in which peer rejection in childhood and deviant peer affiliation in adolescence mediate the link between early externalizing behaviors and more serious adolescent conduct problems; and (b) parallel process model, in which peer rejection and deviant peer affiliation are considered independent processes that operate simultaneously to increment risk for conduct problems. In this review, we evaluate theoretical models and evidence for associations among conduct problems and (a) peer rejection and (b) deviant peer affiliation. We then consider support for the sequential mediation and parallel process models. Next, we propose an integrated model incorporating both the sequential mediation and parallel process models. Future research directions and implications for prevention and intervention efforts are discussed. PMID:25410430
NASA Technical Reports Server (NTRS)
Schumacher, W.; Geiser, G.
1978-01-01
The basic concepts of Petri nets are reviewed as well as their application as the fundamental model of technical systems with concurrent discrete events such as hardware systems and software models of computers. The use of Petri nets is proposed for modeling the human operator dealing with concurrent discrete tasks. Their properties useful in modeling the human operator are discussed and practical examples are given. By means of and experimental investigation of binary concurrent tasks which are presented in a serial manner, the representation of human behavior by Petri nets is demonstrated.
Directions in parallel programming: HPF, shared virtual memory and object parallelism in pC++
NASA Technical Reports Server (NTRS)
Bodin, Francois; Priol, Thierry; Mehrotra, Piyush; Gannon, Dennis
1994-01-01
Fortran and C++ are the dominant programming languages used in scientific computation. Consequently, extensions to these languages are the most popular for programming massively parallel computers. We discuss two such approaches to parallel Fortran and one approach to C++. The High Performance Fortran Forum has designed HPF with the intent of supporting data parallelism on Fortran 90 applications. HPF works by asking the user to help the compiler distribute and align the data structures with the distributed memory modules in the system. Fortran-S takes a different approach in which the data distribution is managed by the operating system and the user provides annotations to indicate parallel control regions. In the case of C++, we look at pC++ which is based on a concurrent aggregate parallel model.
NASA Technical Reports Server (NTRS)
Goldstein, David
1991-01-01
Extensions to an architecture for real-time, distributed (parallel) knowledge-based systems called the Parallel Real-time Artificial Intelligence System (PRAIS) are discussed. PRAIS strives for transparently parallelizing production (rule-based) systems, even under real-time constraints. PRAIS accomplished these goals (presented at the first annual C Language Integrated Production System (CLIPS) conference) by incorporating a dynamic task scheduler, operating system extensions for fact handling, and message-passing among multiple copies of CLIPS executing on a virtual blackboard. This distributed knowledge-based system tool uses the portability of CLIPS and common message-passing protocols to operate over a heterogeneous network of processors. Results using the original PRAIS architecture over a network of Sun 3's, Sun 4's and VAX's are presented. Mechanisms using the producer-consumer model to extend the architecture for fault-tolerance and distributed truth maintenance initiation are also discussed.
Resonance-induced sensitivity enhancement method for conductivity sensors
NASA Technical Reports Server (NTRS)
Tai, Yu-Chong (Inventor); Shih, Chi-yuan (Inventor); Li, Wei (Inventor); Zheng, Siyang (Inventor)
2009-01-01
Methods and systems for improving the sensitivity of a variety of conductivity sensing devices, in particular capacitively-coupled contactless conductivity detectors. A parallel inductor is added to the conductivity sensor. The sensor with the parallel inductor is operated at a resonant frequency of the equivalent circuit model. At the resonant frequency, parasitic capacitances that are either in series or in parallel with the conductance (and possibly a series resistance) is substantially removed from the equivalent circuit, leaving a purely resistive impedance. An appreciably higher sensor sensitivity results. Experimental verification shows that sensitivity improvements of the order of 10,000-fold are possible. Examples of detecting particulates with high precision by application of the apparatus and methods of operation are described.
Zhang, Ziheng; Martin, Jonathan; Wu, Jinfeng; Wang, Haijiang; Promislow, Keith; Balcom, Bruce J
2008-08-01
Water management is critical to optimize the operation of polymer electrolyte membrane fuel cells. At present, numerical models are employed to guide water management in such fuel cells. Accurate measurements of water content variation in polymer electrolyte membrane fuel cells are required to validate these models and to optimize fuel cell behavior. We report a direct water content measurement across the Nafion membrane in an operational polymer electrolyte membrane fuel cell, employing double half k-space spin echo single point imaging techniques. The MRI measurements with T2 mapping were undertaken with a parallel plate resonator to avoid the effects of RF screening. The parallel plate resonator employs the electrodes inherent to the fuel cell to create a resonant circuit at RF frequencies for MR excitation and detection, while still operating as a conventional fuel cell at DC. Three stages of fuel cell operation were investigated: activation, operation and dehydration. Each profile was acquired in 6 min, with 6 microm nominal resolution and a SNR of better than 15.
Multibus-based parallel processor for simulation
NASA Technical Reports Server (NTRS)
Ogrady, E. P.; Wang, C.-H.
1983-01-01
A Multibus-based parallel processor simulation system is described. The system is intended to serve as a vehicle for gaining hands-on experience, testing system and application software, and evaluating parallel processor performance during development of a larger system based on the horizontal/vertical-bus interprocessor communication mechanism. The prototype system consists of up to seven Intel iSBC 86/12A single-board computers which serve as processing elements, a multiple transmission controller (MTC) designed to support system operation, and an Intel Model 225 Microcomputer Development System which serves as the user interface and input/output processor. All components are interconnected by a Multibus/IEEE 796 bus. An important characteristic of the system is that it provides a mechanism for a processing element to broadcast data to other selected processing elements. This parallel transfer capability is provided through the design of the MTC and a minor modification to the iSBC 86/12A board. The operation of the MTC, the basic hardware-level operation of the system, and pertinent details about the iSBC 86/12A and the Multibus are described.
Avoiding and tolerating latency in large-scale next-generation shared-memory multiprocessors
NASA Technical Reports Server (NTRS)
Probst, David K.
1993-01-01
A scalable solution to the memory-latency problem is necessary to prevent the large latencies of synchronization and memory operations inherent in large-scale shared-memory multiprocessors from reducing high performance. We distinguish latency avoidance and latency tolerance. Latency is avoided when data is brought to nearby locales for future reference. Latency is tolerated when references are overlapped with other computation. Latency-avoiding locales include: processor registers, data caches used temporally, and nearby memory modules. Tolerating communication latency requires parallelism, allowing the overlap of communication and computation. Latency-tolerating techniques include: vector pipelining, data caches used spatially, prefetching in various forms, and multithreading in various forms. Relaxing the consistency model permits increased use of avoidance and tolerance techniques. Each model is a mapping from the program text to sets of partial orders on program operations; it is a convention about which temporal precedences among program operations are necessary. Information about temporal locality and parallelism constrains the use of avoidance and tolerance techniques. Suitable architectural primitives and compiler technology are required to exploit the increased freedom to reorder and overlap operations in relaxed models.
Parallel processing in finite element structural analysis
NASA Technical Reports Server (NTRS)
Noor, Ahmed K.
1987-01-01
A brief review is made of the fundamental concepts and basic issues of parallel processing. Discussion focuses on parallel numerical algorithms, performance evaluation of machines and algorithms, and parallelism in finite element computations. A computational strategy is proposed for maximizing the degree of parallelism at different levels of the finite element analysis process including: 1) formulation level (through the use of mixed finite element models); 2) analysis level (through additive decomposition of the different arrays in the governing equations into the contributions to a symmetrized response plus correction terms); 3) numerical algorithm level (through the use of operator splitting techniques and application of iterative processes); and 4) implementation level (through the effective combination of vectorization, multitasking and microtasking, whenever available).
National Centers for Environmental Prediction
Organization Search Enter text Search Navigation Bar End Cap Search EMC Go Branches Global Climate and Weather Modeling Mesoscale Modeling Marine Modeling and Analysis Teams Climate Data Assimilation Ensembles and Post Model Configuration Collaborators Documentation and Code FAQ Operational Change Log Parallel Experiment
Biocellion: accelerating computer simulation of multicellular biological system models
Kang, Seunghwa; Kahan, Simon; McDermott, Jason; Flann, Nicholas; Shmulevich, Ilya
2014-01-01
Motivation: Biological system behaviors are often the outcome of complex interactions among a large number of cells and their biotic and abiotic environment. Computational biologists attempt to understand, predict and manipulate biological system behavior through mathematical modeling and computer simulation. Discrete agent-based modeling (in combination with high-resolution grids to model the extracellular environment) is a popular approach for building biological system models. However, the computational complexity of this approach forces computational biologists to resort to coarser resolution approaches to simulate large biological systems. High-performance parallel computers have the potential to address the computing challenge, but writing efficient software for parallel computers is difficult and time-consuming. Results: We have developed Biocellion, a high-performance software framework, to solve this computing challenge using parallel computers. To support a wide range of multicellular biological system models, Biocellion asks users to provide their model specifics by filling the function body of pre-defined model routines. Using Biocellion, modelers without parallel computing expertise can efficiently exploit parallel computers with less effort than writing sequential programs from scratch. We simulate cell sorting, microbial patterning and a bacterial system in soil aggregate as case studies. Availability and implementation: Biocellion runs on x86 compatible systems with the 64 bit Linux operating system and is freely available for academic use. Visit http://biocellion.com for additional information. Contact: seunghwa.kang@pnnl.gov PMID:25064572
Distribution Locational Real-Time Pricing Based Smart Building Control and Management
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hao, Jun; Dai, Xiaoxiao; Zhang, Yingchen
This paper proposes an real-virtual parallel computing scheme for smart building operations aiming at augmenting overall social welfare. The University of Denver's campus power grid and Ritchie fitness center is used for demonstrating the proposed approach. An artificial virtual system is built in parallel to the real physical system to evaluate the overall social cost of the building operation based on the social science based working productivity model, numerical experiment based building energy consumption model and the power system based real-time pricing mechanism. Through interactive feedback exchanged between the real and virtual system, enlarged social welfare, including monetary cost reductionmore » and energy saving, as well as working productivity improvements, can be achieved.« less
NASA Astrophysics Data System (ADS)
Tabekina, N. A.; Chepchurov, M. S.; Evtushenko, E. I.; Dmitrievsky, B. S.
2018-05-01
The work solves the problem of automation of machining process namely turning to produce parts having the planes parallel to an axis of rotation of part without using special tools. According to the results, the availability of the equipment of a high speed electromechanical drive to control the operative movements of lathe machine will enable one to get the planes parallel to the part axis. The method of getting planes parallel to the part axis is based on the mathematical model, which is presented as functional dependency between the conveying velocity of the driven element and the time. It describes the operative movements of lathe machine all over the tool path. Using the model of movement of the tool, it has been found that the conveying velocity varies from the maximum to zero value. It will allow one to carry out the reverse of the drive. The scheme of tool placement regarding the workpiece has been proposed for unidirectional movement of the driven element at high conveying velocity. The control method of CNC machines can be used for getting geometrically complex parts on the lathe without using special milling tools.
Ferrucci, Filomena; Salza, Pasquale; Sarro, Federica
2017-06-29
The need to improve the scalability of Genetic Algorithms (GAs) has motivated the research on Parallel Genetic Algorithms (PGAs), and different technologies and approaches have been used. Hadoop MapReduce represents one of the most mature technologies to develop parallel algorithms. Based on the fact that parallel algorithms introduce communication overhead, the aim of the present work is to understand if, and possibly when, the parallel GAs solutions using Hadoop MapReduce show better performance than sequential versions in terms of execution time. Moreover, we are interested in understanding which PGA model can be most effective among the global, grid, and island models. We empirically assessed the performance of these three parallel models with respect to a sequential GA on a software engineering problem, evaluating the execution time and the achieved speedup. We also analysed the behaviour of the parallel models in relation to the overhead produced by the use of Hadoop MapReduce and the GAs' computational effort, which gives a more machine-independent measure of these algorithms. We exploited three problem instances to differentiate the computation load and three cluster configurations based on 2, 4, and 8 parallel nodes. Moreover, we estimated the costs of the execution of the experimentation on a potential cloud infrastructure, based on the pricing of the major commercial cloud providers. The empirical study revealed that the use of PGA based on the island model outperforms the other parallel models and the sequential GA for all the considered instances and clusters. Using 2, 4, and 8 nodes, the island model achieves an average speedup over the three datasets of 1.8, 3.4, and 7.0 times, respectively. Hadoop MapReduce has a set of different constraints that need to be considered during the design and the implementation of parallel algorithms. The overhead of data store (i.e., HDFS) accesses, communication, and latency requires solutions that reduce data store operations. For this reason, the island model is more suitable for PGAs than the global and grid model, also in terms of costs when executed on a commercial cloud provider.
An Efficient Fuzzy Controller Design for Parallel Connected Induction Motor Drives
NASA Astrophysics Data System (ADS)
Usha, S.; Subramani, C.
2018-04-01
Generally, an induction motors are highly non-linear and has a complex time varying dynamics. This makes the speed control of an induction motor a challenging issue in the industries. But, due to the recent trends in the power electronic devices and intelligent controllers, the speed control of the induction motor is achieved by including non-linear characteristics also. Conventionally a single inverter is used to run one induction motor in industries. In the traction applications, two or more inductions motors are operated in parallel to reduce the size and cost of induction motors. In this application, the parallel connected induction motors can be driven by a single inverter unit. The stability problems may introduce in the parallel operation under low speed operating conditions. Hence, the speed deviations should be reduce with help of suitable controllers. The speed control of the parallel connected system is performed by PID controller and fuzzy logic controller. In this paper the speed response of the induction motor for the rating of IHP, 1440 rpm, and 50Hz with these controller are compared in time domain specifications. The stability analysis of the system also performed under low speed using matlab platform. The hardware model is developed for speed control using fuzzy logic controller which exhibited superior performances over the other controller.
MPF: A portable message passing facility for shared memory multiprocessors
NASA Technical Reports Server (NTRS)
Malony, Allen D.; Reed, Daniel A.; Mcguire, Patrick J.
1987-01-01
The design, implementation, and performance evaluation of a message passing facility (MPF) for shared memory multiprocessors are presented. The MPF is based on a message passing model conceptually similar to conversations. Participants (parallel processors) can enter or leave a conversation at any time. The message passing primitives for this model are implemented as a portable library of C function calls. The MPF is currently operational on a Sequent Balance 21000, and several parallel applications were developed and tested. Several simple benchmark programs are presented to establish interprocess communication performance for common patterns of interprocess communication. Finally, performance figures are presented for two parallel applications, linear systems solution, and iterative solution of partial differential equations.
Broadcasting collective operation contributions throughout a parallel computer
Faraj, Ahmad [Rochester, MN
2012-02-21
Methods, systems, and products are disclosed for broadcasting collective operation contributions throughout a parallel computer. The parallel computer includes a plurality of compute nodes connected together through a data communications network. Each compute node has a plurality of processors for use in collective parallel operations on the parallel computer. Broadcasting collective operation contributions throughout a parallel computer according to embodiments of the present invention includes: transmitting, by each processor on each compute node, that processor's collective operation contribution to the other processors on that compute node using intra-node communications; and transmitting on a designated network link, by each processor on each compute node according to a serial processor transmission sequence, that processor's collective operation contribution to the other processors on the other compute nodes using inter-node communications.
Parallel processing optimization strategy based on MapReduce model in cloud storage environment
NASA Astrophysics Data System (ADS)
Cui, Jianming; Liu, Jiayi; Li, Qiuyan
2017-05-01
Currently, a large number of documents in the cloud storage process employed the way of packaging after receiving all the packets. From the local transmitter this stored procedure to the server, packing and unpacking will consume a lot of time, and the transmission efficiency is low as well. A new parallel processing algorithm is proposed to optimize the transmission mode. According to the operation machine graphs model work, using MPI technology parallel execution Mapper and Reducer mechanism. It is good to use MPI technology to implement Mapper and Reducer parallel mechanism. After the simulation experiment of Hadoop cloud computing platform, this algorithm can not only accelerate the file transfer rate, but also shorten the waiting time of the Reducer mechanism. It will break through traditional sequential transmission constraints and reduce the storage coupling to improve the transmission efficiency.
Parallelized direct execution simulation of message-passing parallel programs
NASA Technical Reports Server (NTRS)
Dickens, Phillip M.; Heidelberger, Philip; Nicol, David M.
1994-01-01
As massively parallel computers proliferate, there is growing interest in findings ways by which performance of massively parallel codes can be efficiently predicted. This problem arises in diverse contexts such as parallelizing computers, parallel performance monitoring, and parallel algorithm development. In this paper we describe one solution where one directly executes the application code, but uses a discrete-event simulator to model details of the presumed parallel machine such as operating system and communication network behavior. Because this approach is computationally expensive, we are interested in its own parallelization specifically the parallelization of the discrete-event simulator. We describe methods suitable for parallelized direct execution simulation of message-passing parallel programs, and report on the performance of such a system, Large Application Parallel Simulation Environment (LAPSE), we have built on the Intel Paragon. On all codes measured to date, LAPSE predicts performance well typically within 10 percent relative error. Depending on the nature of the application code, we have observed low slowdowns (relative to natively executing code) and high relative speedups using up to 64 processors.
Performance of a parallel code for the Euler equations on hypercube computers
NASA Technical Reports Server (NTRS)
Barszcz, Eric; Chan, Tony F.; Jesperson, Dennis C.; Tuminaro, Raymond S.
1990-01-01
The performance of hypercubes were evaluated on a computational fluid dynamics problem and the parallel environment issues were considered that must be addressed, such as algorithm changes, implementation choices, programming effort, and programming environment. The evaluation focuses on a widely used fluid dynamics code, FLO52, which solves the two dimensional steady Euler equations describing flow around the airfoil. The code development experience is described, including interacting with the operating system, utilizing the message-passing communication system, and code modifications necessary to increase parallel efficiency. Results from two hypercube parallel computers (a 16-node iPSC/2, and a 512-node NCUBE/ten) are discussed and compared. In addition, a mathematical model of the execution time was developed as a function of several machine and algorithm parameters. This model accurately predicts the actual run times obtained and is used to explore the performance of the code in interesting but yet physically realizable regions of the parameter space. Based on this model, predictions about future hypercubes are made.
Reusable Rocket Engine Operability Modeling and Analysis
NASA Technical Reports Server (NTRS)
Christenson, R. L.; Komar, D. R.
1998-01-01
This paper describes the methodology, model, input data, and analysis results of a reusable launch vehicle engine operability study conducted with the goal of supporting design from an operations perspective. Paralleling performance analyses in schedule and method, this requires the use of metrics in a validated operations model useful for design, sensitivity, and trade studies. Operations analysis in this view is one of several design functions. An operations concept was developed given an engine concept and the predicted operations and maintenance processes incorporated into simulation models. Historical operations data at a level of detail suitable to model objectives were collected, analyzed, and formatted for use with the models, the simulations were run, and results collected and presented. The input data used included scheduled and unscheduled timeline and resource information collected into a Space Transportation System (STS) Space Shuttle Main Engine (SSME) historical launch operations database. Results reflect upon the importance not only of reliable hardware but upon operations and corrective maintenance process improvements.
National Centers for Environmental Prediction
Organization Search Enter text Search Navigation Bar End Cap Search EMC Go Branches Global Climate and Weather Modeling Mesoscale Modeling Marine Modeling and Analysis Teams Climate Data Assimilation Ensembles and Post Configuration Collaborators Documentation and Code FAQ Operational Change Log Parallel Experiment Change Log
National Centers for Environmental Prediction
Organization Search Enter text Search Navigation Bar End Cap Search EMC Go Branches Global Climate and Weather Modeling Mesoscale Modeling Marine Modeling and Analysis Teams Climate Data Assimilation Ensembles and Post Collaborators Documentation and Code FAQ Operational Change Log Parallel Experiment Change Log Contacts
Scalable domain decomposition solvers for stochastic PDEs in high performance computing
Desai, Ajit; Khalil, Mohammad; Pettit, Chris; ...
2017-09-21
Stochastic spectral finite element models of practical engineering systems may involve solutions of linear systems or linearized systems for non-linear problems with billions of unknowns. For stochastic modeling, it is therefore essential to design robust, parallel and scalable algorithms that can efficiently utilize high-performance computing to tackle such large-scale systems. Domain decomposition based iterative solvers can handle such systems. And though these algorithms exhibit excellent scalabilities, significant algorithmic and implementational challenges exist to extend them to solve extreme-scale stochastic systems using emerging computing platforms. Intrusive polynomial chaos expansion based domain decomposition algorithms are extended here to concurrently handle high resolutionmore » in both spatial and stochastic domains using an in-house implementation. Sparse iterative solvers with efficient preconditioners are employed to solve the resulting global and subdomain level local systems through multi-level iterative solvers. We also use parallel sparse matrix–vector operations to reduce the floating-point operations and memory requirements. Numerical and parallel scalabilities of these algorithms are presented for the diffusion equation having spatially varying diffusion coefficient modeled by a non-Gaussian stochastic process. Scalability of the solvers with respect to the number of random variables is also investigated.« less
Scalable domain decomposition solvers for stochastic PDEs in high performance computing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Desai, Ajit; Khalil, Mohammad; Pettit, Chris
Stochastic spectral finite element models of practical engineering systems may involve solutions of linear systems or linearized systems for non-linear problems with billions of unknowns. For stochastic modeling, it is therefore essential to design robust, parallel and scalable algorithms that can efficiently utilize high-performance computing to tackle such large-scale systems. Domain decomposition based iterative solvers can handle such systems. And though these algorithms exhibit excellent scalabilities, significant algorithmic and implementational challenges exist to extend them to solve extreme-scale stochastic systems using emerging computing platforms. Intrusive polynomial chaos expansion based domain decomposition algorithms are extended here to concurrently handle high resolutionmore » in both spatial and stochastic domains using an in-house implementation. Sparse iterative solvers with efficient preconditioners are employed to solve the resulting global and subdomain level local systems through multi-level iterative solvers. We also use parallel sparse matrix–vector operations to reduce the floating-point operations and memory requirements. Numerical and parallel scalabilities of these algorithms are presented for the diffusion equation having spatially varying diffusion coefficient modeled by a non-Gaussian stochastic process. Scalability of the solvers with respect to the number of random variables is also investigated.« less
On Parallelizing Single Dynamic Simulation Using HPC Techniques and APIs of Commercial Software
DOE Office of Scientific and Technical Information (OSTI.GOV)
Diao, Ruisheng; Jin, Shuangshuang; Howell, Frederic
Time-domain simulations are heavily used in today’s planning and operation practices to assess power system transient stability and post-transient voltage/frequency profiles following severe contingencies to comply with industry standards. Because of the increased modeling complexity, it is several times slower than real time for state-of-the-art commercial packages to complete a dynamic simulation for a large-scale model. With the growing stochastic behavior introduced by emerging technologies, power industry has seen a growing need for performing security assessment in real time. This paper presents a parallel implementation framework to speed up a single dynamic simulation by leveraging the existing stability model librarymore » in commercial tools through their application programming interfaces (APIs). Several high performance computing (HPC) techniques are explored such as parallelizing the calculation of generator current injection, identifying fast linear solvers for network solution, and parallelizing data outputs when interacting with APIs in the commercial package, TSAT. The proposed method has been tested on a WECC planning base case with detailed synchronous generator models and exhibits outstanding scalable performance with sufficient accuracy.« less
Biocellion: accelerating computer simulation of multicellular biological system models.
Kang, Seunghwa; Kahan, Simon; McDermott, Jason; Flann, Nicholas; Shmulevich, Ilya
2014-11-01
Biological system behaviors are often the outcome of complex interactions among a large number of cells and their biotic and abiotic environment. Computational biologists attempt to understand, predict and manipulate biological system behavior through mathematical modeling and computer simulation. Discrete agent-based modeling (in combination with high-resolution grids to model the extracellular environment) is a popular approach for building biological system models. However, the computational complexity of this approach forces computational biologists to resort to coarser resolution approaches to simulate large biological systems. High-performance parallel computers have the potential to address the computing challenge, but writing efficient software for parallel computers is difficult and time-consuming. We have developed Biocellion, a high-performance software framework, to solve this computing challenge using parallel computers. To support a wide range of multicellular biological system models, Biocellion asks users to provide their model specifics by filling the function body of pre-defined model routines. Using Biocellion, modelers without parallel computing expertise can efficiently exploit parallel computers with less effort than writing sequential programs from scratch. We simulate cell sorting, microbial patterning and a bacterial system in soil aggregate as case studies. Biocellion runs on x86 compatible systems with the 64 bit Linux operating system and is freely available for academic use. Visit http://biocellion.com for additional information. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
NASA Astrophysics Data System (ADS)
Xie, Lizhe; Hu, Yining; Chen, Yang; Shi, Luyao
2015-03-01
Projection and back-projection are the most computational consuming parts in Computed Tomography (CT) reconstruction. Parallelization strategies using GPU computing techniques have been introduced. We in this paper present a new parallelization scheme for both projection and back-projection. The proposed method is based on CUDA technology carried out by NVIDIA Corporation. Instead of build complex model, we aimed on optimizing the existing algorithm and make it suitable for CUDA implementation so as to gain fast computation speed. Besides making use of texture fetching operation which helps gain faster interpolation speed, we fixed sampling numbers in the computation of projection, to ensure the synchronization of blocks and threads, thus prevents the latency caused by inconsistent computation complexity. Experiment results have proven the computational efficiency and imaging quality of the proposed method.
Chen, Weiliang; De Schutter, Erik
2017-01-01
Stochastic, spatial reaction-diffusion simulations have been widely used in systems biology and computational neuroscience. However, the increasing scale and complexity of models and morphologies have exceeded the capacity of any serial implementation. This led to the development of parallel solutions that benefit from the boost in performance of modern supercomputers. In this paper, we describe an MPI-based, parallel operator-splitting implementation for stochastic spatial reaction-diffusion simulations with irregular tetrahedral meshes. The performance of our implementation is first examined and analyzed with simulations of a simple model. We then demonstrate its application to real-world research by simulating the reaction-diffusion components of a published calcium burst model in both Purkinje neuron sub-branch and full dendrite morphologies. Simulation results indicate that our implementation is capable of achieving super-linear speedup for balanced loading simulations with reasonable molecule density and mesh quality. In the best scenario, a parallel simulation with 2,000 processes runs more than 3,600 times faster than its serial SSA counterpart, and achieves more than 20-fold speedup relative to parallel simulation with 100 processes. In a more realistic scenario with dynamic calcium influx and data recording, the parallel simulation with 1,000 processes and no load balancing is still 500 times faster than the conventional serial SSA simulation. PMID:28239346
Chen, Weiliang; De Schutter, Erik
2017-01-01
Stochastic, spatial reaction-diffusion simulations have been widely used in systems biology and computational neuroscience. However, the increasing scale and complexity of models and morphologies have exceeded the capacity of any serial implementation. This led to the development of parallel solutions that benefit from the boost in performance of modern supercomputers. In this paper, we describe an MPI-based, parallel operator-splitting implementation for stochastic spatial reaction-diffusion simulations with irregular tetrahedral meshes. The performance of our implementation is first examined and analyzed with simulations of a simple model. We then demonstrate its application to real-world research by simulating the reaction-diffusion components of a published calcium burst model in both Purkinje neuron sub-branch and full dendrite morphologies. Simulation results indicate that our implementation is capable of achieving super-linear speedup for balanced loading simulations with reasonable molecule density and mesh quality. In the best scenario, a parallel simulation with 2,000 processes runs more than 3,600 times faster than its serial SSA counterpart, and achieves more than 20-fold speedup relative to parallel simulation with 100 processes. In a more realistic scenario with dynamic calcium influx and data recording, the parallel simulation with 1,000 processes and no load balancing is still 500 times faster than the conventional serial SSA simulation.
Solving Navier-Stokes equations on a massively parallel processor; The 1 GFLOP performance
DOE Office of Scientific and Technical Information (OSTI.GOV)
Saati, A.; Biringen, S.; Farhat, C.
This paper reports on experience in solving large-scale fluid dynamics problems on the Connection Machine model CM-2. The authors have implemented a parallel version of the MacCormack scheme for the solution of the Navier-Stokes equations. By using triad floating point operations and reducing the number of interprocessor communications, they have achieved a sustained performance rate of 1.42 GFLOPS.
NASA Astrophysics Data System (ADS)
Lu, Qiheng; Feng, Xiaoyun
2013-03-01
After analyzing the working principle of the four-aspect fixed autoblock system, an energy-saving control model was created based on the dynamics equations of the trains in order to study the energy-saving optimal control strategy of trains in a following operation. Besides the safety and punctuality, the main aims of the model were the energy consumption and the time error. Based on this model, the static and dynamic speed restraints under a four-aspect fixed autoblock system were put forward. The multi-dimension parallel genetic algorithm (GA) and the external punishment function were adopted to solve this problem. By using the real number coding and the strategy of ramps divided into three parts, the convergence of GA was speeded up and the length of chromosomes was shortened. A vector of Gaussian random disturbance with zero mean was superposed to the mutation operator. The simulation result showed that the method could reduce the energy consumption effectively based on safety and punctuality.
Brian hears: online auditory processing using vectorization over channels.
Fontaine, Bertrand; Goodman, Dan F M; Benichoux, Victor; Brette, Romain
2011-01-01
The human cochlea includes about 3000 inner hair cells which filter sounds at frequencies between 20 Hz and 20 kHz. This massively parallel frequency analysis is reflected in models of auditory processing, which are often based on banks of filters. However, existing implementations do not exploit this parallelism. Here we propose algorithms to simulate these models by vectorizing computation over frequency channels, which are implemented in "Brian Hears," a library for the spiking neural network simulator package "Brian." This approach allows us to use high-level programming languages such as Python, because with vectorized operations, the computational cost of interpretation represents a small fraction of the total cost. This makes it possible to define and simulate complex models in a simple way, while all previous implementations were model-specific. In addition, we show that these algorithms can be naturally parallelized using graphics processing units, yielding substantial speed improvements. We demonstrate these algorithms with several state-of-the-art cochlear models, and show that they compare favorably with existing, less flexible, implementations.
NASA Astrophysics Data System (ADS)
Lashkin, S. V.; Kozelkov, A. S.; Yalozo, A. V.; Gerasimov, V. Yu.; Zelensky, D. K.
2017-12-01
This paper describes the details of the parallel implementation of the SIMPLE algorithm for numerical solution of the Navier-Stokes system of equations on arbitrary unstructured grids. The iteration schemes for the serial and parallel versions of the SIMPLE algorithm are implemented. In the description of the parallel implementation, special attention is paid to computational data exchange among processors under the condition of the grid model decomposition using fictitious cells. We discuss the specific features for the storage of distributed matrices and implementation of vector-matrix operations in parallel mode. It is shown that the proposed way of matrix storage reduces the number of interprocessor exchanges. A series of numerical experiments illustrates the effect of the multigrid SLAE solver tuning on the general efficiency of the algorithm; the tuning involves the types of the cycles used (V, W, and F), the number of iterations of a smoothing operator, and the number of cells for coarsening. Two ways (direct and indirect) of efficiency evaluation for parallelization of the numerical algorithm are demonstrated. The paper presents the results of solving some internal and external flow problems with the evaluation of parallelization efficiency by two algorithms. It is shown that the proposed parallel implementation enables efficient computations for the problems on a thousand processors. Based on the results obtained, some general recommendations are made for the optimal tuning of the multigrid solver, as well as for selecting the optimal number of cells per processor.
NASA Astrophysics Data System (ADS)
Zerr, Robert Joseph
2011-12-01
The integral transport matrix method (ITMM) has been used as the kernel of new parallel solution methods for the discrete ordinates approximation of the within-group neutron transport equation. The ITMM abandons the repetitive mesh sweeps of the traditional source iterations (SI) scheme in favor of constructing stored operators that account for the direct coupling factors among all the cells and between the cells and boundary surfaces. The main goals of this work were to develop the algorithms that construct these operators and employ them in the solution process, determine the most suitable way to parallelize the entire procedure, and evaluate the behavior and performance of the developed methods for increasing number of processes. This project compares the effectiveness of the ITMM with the SI scheme parallelized with the Koch-Baker-Alcouffe (KBA) method. The primary parallel solution method involves a decomposition of the domain into smaller spatial sub-domains, each with their own transport matrices, and coupled together via interface boundary angular fluxes. Each sub-domain has its own set of ITMM operators and represents an independent transport problem. Multiple iterative parallel solution methods have investigated, including parallel block Jacobi (PBJ), parallel red/black Gauss-Seidel (PGS), and parallel GMRES (PGMRES). The fastest observed parallel solution method, PGS, was used in a weak scaling comparison with the PARTISN code. Compared to the state-of-the-art SI-KBA with diffusion synthetic acceleration (DSA), this new method without acceleration/preconditioning is not competitive for any problem parameters considered. The best comparisons occur for problems that are difficult for SI DSA, namely highly scattering and optically thick. SI DSA execution time curves are generally steeper than the PGS ones. However, until further testing is performed it cannot be concluded that SI DSA does not outperform the ITMM with PGS even on several thousand or tens of thousands of processors. The PGS method does outperform SI DSA for the periodic heterogeneous layers (PHL) configuration problems. Although this demonstrates a relative strength/weakness between the two methods, the practicality of these problems is much less, further limiting instances where it would be beneficial to select ITMM over SI DSA. The results strongly indicate a need for a robust, stable, and efficient acceleration method (or preconditioner for PGMRES). The spatial multigrid (SMG) method is currently incomplete in that it does not work for all cases considered and does not effectively improve the convergence rate for all values of scattering ratio c or cell dimension h. Nevertheless, it does display the desired trend for highly scattering, optically thin problems. That is, it tends to lower the rate of growth of number of iterations with increasing number of processes, P, while not increasing the number of additional operations per iteration to the extent that the total execution time of the rapidly converging accelerated iterations exceeds that of the slower unaccelerated iterations. A predictive parallel performance model has been developed for the PBJ method. Timing tests were performed such that trend lines could be fitted to the data for the different components and used to estimate the execution times. Applied to the weak scaling results, the model notably underestimates construction time, but combined with a slight overestimation in iterative solution time, the model predicts total execution time very well for large P. It also does a decent job with the strong scaling results, closely predicting the construction time and time per iteration, especially as P increases. Although not shown to be competitive up to 1,024 processing elements with the current state of the art, the parallelized ITMM exhibits promising scaling trends. Ultimately, compared to the KBA method, the parallelized ITMM may be found to be a very attractive option for transport calculations spatially decomposed over several tens of thousands of processes. Acceleration/preconditioning of the parallelized ITMM once developed will improve the convergence rate and improve its competitiveness. (Abstract shortened by UMI.)
Characterization of robotics parallel algorithms and mapping onto a reconfigurable SIMD machine
NASA Technical Reports Server (NTRS)
Lee, C. S. G.; Lin, C. T.
1989-01-01
The kinematics, dynamics, Jacobian, and their corresponding inverse computations are six essential problems in the control of robot manipulators. Efficient parallel algorithms for these computations are discussed and analyzed. Their characteristics are identified and a scheme on the mapping of these algorithms to a reconfigurable parallel architecture is presented. Based on the characteristics including type of parallelism, degree of parallelism, uniformity of the operations, fundamental operations, data dependencies, and communication requirement, it is shown that most of the algorithms for robotic computations possess highly regular properties and some common structures, especially the linear recursive structure. Moreover, they are well-suited to be implemented on a single-instruction-stream multiple-data-stream (SIMD) computer with reconfigurable interconnection network. The model of a reconfigurable dual network SIMD machine with internal direct feedback is introduced. A systematic procedure internal direct feedback is introduced. A systematic procedure to map these computations to the proposed machine is presented. A new scheduling problem for SIMD machines is investigated and a heuristic algorithm, called neighborhood scheduling, that reorders the processing sequence of subtasks to reduce the communication time is described. Mapping results of a benchmark algorithm are illustrated and discussed.
Cellular automata with object-oriented features for parallel molecular network modeling.
Zhu, Hao; Wu, Yinghui; Huang, Sui; Sun, Yan; Dhar, Pawan
2005-06-01
Cellular automata are an important modeling paradigm for studying the dynamics of large, parallel systems composed of multiple, interacting components. However, to model biological systems, cellular automata need to be extended beyond the large-scale parallelism and intensive communication in order to capture two fundamental properties characteristic of complex biological systems: hierarchy and heterogeneity. This paper proposes extensions to a cellular automata language, Cellang, to meet this purpose. The extended language, with object-oriented features, can be used to describe the structure and activity of parallel molecular networks within cells. Capabilities of this new programming language include object structure to define molecular programs within a cell, floating-point data type and mathematical functions to perform quantitative computation, message passing capability to describe molecular interactions, as well as new operators, statements, and built-in functions. We discuss relevant programming issues of these features, including the object-oriented description of molecular interactions with molecule encapsulation, message passing, and the description of heterogeneity and anisotropy at the cell and molecule levels. By enabling the integration of modeling at the molecular level with system behavior at cell, tissue, organ, or even organism levels, the program will help improve our understanding of how complex and dynamic biological activities are generated and controlled by parallel functioning of molecular networks. Index Terms-Cellular automata, modeling, molecular network, object-oriented.
Planning paths through a spatial hierarchy - Eliminating stair-stepping effects
NASA Technical Reports Server (NTRS)
Slack, Marc G.
1989-01-01
Stair-stepping effects are a result of the loss of spatial continuity resulting from the decomposition of space into a grid. This paper presents a path planning algorithm which eliminates stair-stepping effects induced by the grid-based spatial representation. The algorithm exploits a hierarchical spatial model to efficiently plan paths for a mobile robot operating in dynamic domains. The spatial model and path planning algorithm map to a parallel machine, allowing the system to operate incrementally, thereby accounting for unexpected events in the operating space.
Wang, Zhaocai; Ji, Zuwen; Wang, Xiaoming; Wu, Tunhua; Huang, Wei
2017-12-01
As a promising approach to solve the computationally intractable problem, the method based on DNA computing is an emerging research area including mathematics, computer science and molecular biology. The task scheduling problem, as a well-known NP-complete problem, arranges n jobs to m individuals and finds the minimum execution time of last finished individual. In this paper, we use a biologically inspired computational model and describe a new parallel algorithm to solve the task scheduling problem by basic DNA molecular operations. In turn, we skillfully design flexible length DNA strands to represent elements of the allocation matrix, take appropriate biological experiment operations and get solutions of the task scheduling problem in proper length range with less than O(n 2 ) time complexity. Copyright © 2017. Published by Elsevier B.V.
Archer, Charles J.; Blocksome, Michael A.; Ratterman, Joseph D.; Smith, Brian E.
2014-08-12
Endpoint-based parallel data processing in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI composed of data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes coupled for data communications through the PAMI, including establishing a data communications geometry, the geometry specifying, for tasks representing processes of execution of the parallel application, a set of endpoints that are used in collective operations of the PAMI including a plurality of endpoints for one of the tasks; receiving in endpoints of the geometry an instruction for a collective operation; and executing the instruction for a collective operation through the endpoints in dependence upon the geometry, including dividing data communications operations among the plurality of endpoints for one of the tasks.
Tolerant (parallel) Programming
NASA Technical Reports Server (NTRS)
DiNucci, David C.; Bailey, David H. (Technical Monitor)
1997-01-01
In order to be truly portable, a program must be tolerant of a wide range of development and execution environments, and a parallel program is just one which must be tolerant of a very wide range. This paper first defines the term "tolerant programming", then describes many layers of tools to accomplish it. The primary focus is on F-Nets, a formal model for expressing computation as a folded partial-ordering of operations, thereby providing an architecture-independent expression of tolerant parallel algorithms. For implementing F-Nets, Cooperative Data Sharing (CDS) is a subroutine package for implementing communication efficiently in a large number of environments (e.g. shared memory and message passing). Software Cabling (SC), a very-high-level graphical programming language for building large F-Nets, possesses many of the features normally expected from today's computer languages (e.g. data abstraction, array operations). Finally, L2(sup 3) is a CASE tool which facilitates the construction, compilation, execution, and debugging of SC programs.
Optics Program Modified for Multithreaded Parallel Computing
NASA Technical Reports Server (NTRS)
Lou, John; Bedding, Dave; Basinger, Scott
2006-01-01
A powerful high-performance computer program for simulating and analyzing adaptive and controlled optical systems has been developed by modifying the serial version of the Modeling and Analysis for Controlled Optical Systems (MACOS) program to impart capabilities for multithreaded parallel processing on computing systems ranging from supercomputers down to Symmetric Multiprocessing (SMP) personal computers. The modifications included the incorporation of OpenMP, a portable and widely supported application interface software, that can be used to explicitly add multithreaded parallelism to an application program under a shared-memory programming model. OpenMP was applied to parallelize ray-tracing calculations, one of the major computing components in MACOS. Multithreading is also used in the diffraction propagation of light in MACOS based on pthreads [POSIX Thread, (where "POSIX" signifies a portable operating system for UNIX)]. In tests of the parallelized version of MACOS, the speedup in ray-tracing calculations was found to be linear, or proportional to the number of processors, while the speedup in diffraction calculations ranged from 50 to 60 percent, depending on the type and number of processors. The parallelized version of MACOS is portable, and, to the user, its interface is basically the same as that of the original serial version of MACOS.
PARAMESH: A Parallel Adaptive Mesh Refinement Community Toolkit
NASA Technical Reports Server (NTRS)
MacNeice, Peter; Olson, Kevin M.; Mobarry, Clark; deFainchtein, Rosalinda; Packer, Charles
1999-01-01
In this paper, we describe a community toolkit which is designed to provide parallel support with adaptive mesh capability for a large and important class of computational models, those using structured, logically cartesian meshes. The package of Fortran 90 subroutines, called PARAMESH, is designed to provide an application developer with an easy route to extend an existing serial code which uses a logically cartesian structured mesh into a parallel code with adaptive mesh refinement. Alternatively, in its simplest use, and with minimal effort, it can operate as a domain decomposition tool for users who want to parallelize their serial codes, but who do not wish to use adaptivity. The package can provide them with an incremental evolutionary path for their code, converting it first to uniformly refined parallel code, and then later if they so desire, adding adaptivity.
Integration of Modelling and Graphics to Create an Infrared Signal Processing Test Bed
NASA Astrophysics Data System (ADS)
Sethi, H. R.; Ralph, John E.
1989-03-01
The work reported in this paper was carried out as part of a contract with MoD (PE) UK. It considers the problems associated with realistic modelling of a passive infrared system in an operational environment. Ideally all aspects of the system and environment should be integrated into a complete end-to-end simulation but in the past limited computing power has prevented this. Recent developments in workstation technology and the increasing availability of parallel processing techniques makes the end-to-end simulation possible. However the complexity and speed of such simulations means difficulties for the operator in controlling the software and understanding the results. These difficulties can be greatly reduced by providing an extremely user friendly interface and a very flexible, high power, high resolution colour graphics capability. Most system modelling is based on separate software simulation of the individual components of the system itself and its environment. These component models may have their own characteristic inbuilt assumptions and approximations, may be written in the language favoured by the originator and may have a wide variety of input and output conventions and requirements. The models and their limitations need to be matched to the range of conditions appropriate to the operational scenerio. A comprehensive set of data bases needs to be generated by the component models and these data bases must be made readily available to the investigator. Performance measures need to be defined and displayed in some convenient graphics form. Some options are presented for combining available hardware and software to create an environment within which the models can be integrated, and which provide the required man-machine interface, graphics and computing power. The impact of massively parallel processing and artificial intelligence will be discussed. Parallel processing will make real time end-to-end simulation possible and will greatly improve the graphical visualisation of the model output data. Artificial intelligence should help to enhance the man-machine interface.
NASA Astrophysics Data System (ADS)
Wanguang, Sun; Chengzhen, Li; Baoshan, Fan
2018-06-01
Rivers are drying up most frequently in West Liaohe River plain and the bare river beds present fine sand belts on land. These sand belts, which yield a dust heavily in windy days, stress the local environment deeply as the riverbeds are eroded by wind. The optimal operation of water resources, thus, is one of the most important methods for preventing the wind erosion of riverbeds. In this paper, optimal operation model for water resources based on riverbed wind erosion control has been established, which contains objective function, constraints, and solution method. The objective function considers factors which include water volume diverted into reservoirs, river length and lower threshold of flow rate, etc. On the basis of ensuring the water requirement of each reservoir, the destruction of the vegetation in the riverbed by the frequent river flow is avoided. The multi core parallel solving method for optimal water resources operation in the West Liaohe River Plain is proposed, which the optimal solution is found by DPSA method under the POA framework and the parallel computing program is designed in Fork/Join mode. Based on the optimal operation results, the basic rules of water resources operation in the West Liaohe River Plain are summarized. Calculation results show that, on the basis of meeting the requirement of water volume of every reservoir, the frequency of reach river flow which from Taihekou to Talagan Water Diversion Project in the Xinkai River is reduced effectively. The speedup and parallel efficiency of parallel algorithm are 1.51 and 0.76 respectively, and the computing time is significantly decreased. The research results show in this paper can provide technical support for the prevention and control of riverbed wind erosion in the West Liaohe River plain.
An Asynchronous Many-Task Implementation of In-Situ Statistical Analysis using Legion.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pebay, Philippe Pierre; Bennett, Janine Camille
2015-11-01
In this report, we propose a framework for the design and implementation of in-situ analy- ses using an asynchronous many-task (AMT) model, using the Legion programming model together with the MiniAero mini-application as a surrogate for full-scale parallel scientific computing applications. The bulk of this work consists of converting the Learn/Derive/Assess model which we had initially developed for parallel statistical analysis using MPI [PTBM11], from a SPMD to an AMT model. In this goal, we propose an original use of the concept of Legion logical regions as a replacement for the parallel communication schemes used for the only operation ofmore » the statistics engines that require explicit communication. We then evaluate this proposed scheme in a shared memory environment, using the Legion port of MiniAero as a proxy for a full-scale scientific application, as a means to provide input data sets of variable size for the in-situ statistical analyses in an AMT context. We demonstrate in particular that the approach has merit, and warrants further investigation, in collaboration with ongoing efforts to improve the overall parallel performance of the Legion system.« less
Computing Models for FPGA-Based Accelerators
Herbordt, Martin C.; Gu, Yongfeng; VanCourt, Tom; Model, Josh; Sukhwani, Bharat; Chiu, Matt
2011-01-01
Field-programmable gate arrays are widely considered as accelerators for compute-intensive applications. A critical phase of FPGA application development is finding and mapping to the appropriate computing model. FPGA computing enables models with highly flexible fine-grained parallelism and associative operations such as broadcast and collective response. Several case studies demonstrate the effectiveness of using these computing models in developing FPGA applications for molecular modeling. PMID:21603152
Broadcasting a message in a parallel computer
Berg, Jeremy E [Rochester, MN; Faraj, Ahmad A [Rochester, MN
2011-08-02
Methods, systems, and products are disclosed for broadcasting a message in a parallel computer. The parallel computer includes a plurality of compute nodes connected together using a data communications network. The data communications network optimized for point to point data communications and is characterized by at least two dimensions. The compute nodes are organized into at least one operational group of compute nodes for collective parallel operations of the parallel computer. One compute node of the operational group assigned to be a logical root. Broadcasting a message in a parallel computer includes: establishing a Hamiltonian path along all of the compute nodes in at least one plane of the data communications network and in the operational group; and broadcasting, by the logical root to the remaining compute nodes, the logical root's message along the established Hamiltonian path.
Parallelization of the Physical-Space Statistical Analysis System (PSAS)
NASA Technical Reports Server (NTRS)
Larson, J. W.; Guo, J.; Lyster, P. M.
1999-01-01
Atmospheric data assimilation is a method of combining observations with model forecasts to produce a more accurate description of the atmosphere than the observations or forecast alone can provide. Data assimilation plays an increasingly important role in the study of climate and atmospheric chemistry. The NASA Data Assimilation Office (DAO) has developed the Goddard Earth Observing System Data Assimilation System (GEOS DAS) to create assimilated datasets. The core computational components of the GEOS DAS include the GEOS General Circulation Model (GCM) and the Physical-space Statistical Analysis System (PSAS). The need for timely validation of scientific enhancements to the data assimilation system poses computational demands that are best met by distributed parallel software. PSAS is implemented in Fortran 90 using object-based design principles. The analysis portions of the code solve two equations. The first of these is the "innovation" equation, which is solved on the unstructured observation grid using a preconditioned conjugate gradient (CG) method. The "analysis" equation is a transformation from the observation grid back to a structured grid, and is solved by a direct matrix-vector multiplication. Use of a factored-operator formulation reduces the computational complexity of both the CG solver and the matrix-vector multiplication, rendering the matrix-vector multiplications as a successive product of operators on a vector. Sparsity is introduced to these operators by partitioning the observations using an icosahedral decomposition scheme. PSAS builds a large (approx. 128MB) run-time database of parameters used in the calculation of these operators. Implementing a message passing parallel computing paradigm into an existing yet developing computational system as complex as PSAS is nontrivial. One of the technical challenges is balancing the requirements for computational reproducibility with the need for high performance. The problem of computational reproducibility is well known in the parallel computing community. It is a requirement that the parallel code perform calculations in a fashion that will yield identical results on different configurations of processing elements on the same platform. In some cases this problem can be solved by sacrificing performance. Meeting this requirement and still achieving high performance is very difficult. Topics to be discussed include: current PSAS design and parallelization strategy; reproducibility issues; load balance vs. database memory demands, possible solutions to these problems.
GraphReduce: Processing Large-Scale Graphs on Accelerator-Based Systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sengupta, Dipanjan; Song, Shuaiwen; Agarwal, Kapil
2015-11-15
Recent work on real-world graph analytics has sought to leverage the massive amount of parallelism offered by GPU devices, but challenges remain due to the inherent irregularity of graph algorithms and limitations in GPU-resident memory for storing large graphs. We present GraphReduce, a highly efficient and scalable GPU-based framework that operates on graphs that exceed the device’s internal memory capacity. GraphReduce adopts a combination of edge- and vertex-centric implementations of the Gather-Apply-Scatter programming model and operates on multiple asynchronous GPU streams to fully exploit the high degrees of parallelism in GPUs with efficient graph data movement between the host andmore » device.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Archer, Charles J; Blocksome, Michael A; Cernohous, Bob R
Methods, apparatuses, and computer program products for endpoint-based parallel data processing with non-blocking collective instructions in a parallel active messaging interface (`PAMI`) of a parallel computer are provided. Embodiments include establishing by a parallel application a data communications geometry, the geometry specifying a set of endpoints that are used in collective operations of the PAMI, including associating with the geometry a list of collective algorithms valid for use with the endpoints of the geometry. Embodiments also include registering in each endpoint in the geometry a dispatch callback function for a collective operation and executing without blocking, through a single onemore » of the endpoints in the geometry, an instruction for the collective operation.« less
Integrated Joule switches for the control of current dynamics in parallel superconducting strips
NASA Astrophysics Data System (ADS)
Casaburi, A.; Heath, R. M.; Cristiano, R.; Ejrnaes, M.; Zen, N.; Ohkubo, M.; Hadfield, R. H.
2018-06-01
Understanding and harnessing the physics of the dynamic current distribution in parallel superconducting strips holds the key to creating next generation sensors for single molecule and single photon detection. Non-uniformity in the current distribution in parallel superconducting strips leads to low detection efficiency and unstable operation, preventing the scale up to large area sensors. Recent studies indicate that non-uniform current distributions occurring in parallel strips can be understood and modeled in the framework of the generalized London model. Here we build on this important physical insight, investigating an innovative design with integrated superconducting-to-resistive Joule switches to break the superconducting loops between the strips and thus control the current dynamics. Employing precision low temperature nano-optical techniques, we map the uniformity of the current distribution before- and after the resistive strip switching event, confirming the effectiveness of our design. These results provide important insights for the development of next generation large area superconducting strip-based sensors.
NASA Astrophysics Data System (ADS)
Hartmann, Alfred; Redfield, Steve
1989-04-01
This paper discusses design of large-scale (1000x 1000) optical crossbar switching networks for use in parallel processing supercom-puters. Alternative design sketches for an optical crossbar switching network are presented using free-space optical transmission with either a beam spreading/masking model or a beam steering model for internodal communications. The performances of alternative multiple access channel communications protocol-unslotted and slotted ALOHA and carrier sense multiple access (CSMA)-are compared with the performance of the classic arbitrated bus crossbar of conventional electronic parallel computing. These comparisons indicate an almost inverse relationship between ease of implementation and speed of operation. Practical issues of optical system design are addressed, and an optically addressed, composite spatial light modulator design is presented for fabrication to arbitrarily large scale. The wide range of switch architecture, communications protocol, optical systems design, device fabrication, and system performance problems presented by these design sketches poses a serious challenge to practical exploitation of highly parallel optical interconnects in advanced computer designs.
A Parallel Vector Machine for the PM Programming Language
NASA Astrophysics Data System (ADS)
Bellerby, Tim
2016-04-01
PM is a new programming language which aims to make the writing of computational geoscience models on parallel hardware accessible to scientists who are not themselves expert parallel programmers. It is based around the concept of communicating operators: language constructs that enable variables local to a single invocation of a parallelised loop to be viewed as if they were arrays spanning the entire loop domain. This mechanism enables different loop invocations (which may or may not be executing on different processors) to exchange information in a manner that extends the successful Communicating Sequential Processes idiom from single messages to collective communication. Communicating operators avoid the additional synchronisation mechanisms, such as atomic variables, required when programming using the Partitioned Global Address Space (PGAS) paradigm. Using a single loop invocation as the fundamental unit of concurrency enables PM to uniformly represent different levels of parallelism from vector operations through shared memory systems to distributed grids. This paper describes an implementation of PM based on a vectorised virtual machine. On a single processor node, concurrent operations are implemented using masked vector operations. Virtual machine instructions operate on vectors of values and may be unmasked, masked using a Boolean field, or masked using an array of active vector cell locations. Conditional structures (such as if-then-else or while statement implementations) calculate and apply masks to the operations they control. A shift in mask representation from Boolean to location-list occurs when active locations become sufficiently sparse. Parallel loops unfold data structures (or vectors of data structures for nested loops) into vectors of values that may additionally be distributed over multiple computational nodes and then split into micro-threads compatible with the size of the local cache. Inter-node communication is accomplished using standard OpenMP and MPI. Performance analyses of the PM vector machine, demonstrating its scaling properties with respect to domain size and the number of processor nodes will be presented for a range of hardware configurations. The PM software and language definition are being made available under unrestrictive MIT and Creative Commons Attribution licenses respectively: www.pm-lang.org.
Simulated Wake Characteristics Data for Closely Spaced Parallel Runway Operations Analysis
NASA Technical Reports Server (NTRS)
Guerreiro, Nelson M.; Neitzke, Kurt W.
2012-01-01
A simulation experiment was performed to generate and compile wake characteristics data relevant to the evaluation and feasibility analysis of closely spaced parallel runway (CSPR) operational concepts. While the experiment in this work is not tailored to any particular operational concept, the generated data applies to the broader class of CSPR concepts, where a trailing aircraft on a CSPR approach is required to stay ahead of the wake vortices generated by a lead aircraft on an adjacent CSPR. Data for wake age, circulation strength, and wake altitude change, at various lateral offset distances from the wake-generating lead aircraft approach path were compiled for a set of nine aircraft spanning the full range of FAA and ICAO wake classifications. A total of 54 scenarios were simulated to generate data related to key parameters that determine wake behavior. Of particular interest are wake age characteristics that can be used to evaluate both time- and distance- based in-trail separation concepts for all aircraft wake-class combinations. A simple first-order difference model was developed to enable the computation of wake parameter estimates for aircraft models having weight, wingspan and speed characteristics similar to those of the nine aircraft modeled in this work.
Progress with the COGENT Edge Kinetic Code: Collision operator options
Dorf, M. A.; Cohen, R. H.; Compton, J. C.; ...
2012-06-27
In this study, COGENT is a continuum gyrokinetic code for edge plasmas being developed by the Edge Simulation Laboratory collaboration. The code is distinguished by application of the fourth order conservative discretization, and mapped multiblock grid technology to handle the geometric complexity of the tokamak edge. It is written in v∥-μ (parallel velocity – magnetic moment) velocity coordinates, and making use of the gyrokinetic Poisson equation for the calculation of a self-consistent electric potential. In the present manuscript we report on the implementation and initial testing of a succession of increasingly detailed collision operator options, including a simple drag-diffusion operatormore » in the parallel velocity space, Lorentz collisions, and a linearized model Fokker-Planck collision operator conserving momentum and energy (© 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim)« less
Steering intermediate courses: desert ants combine information from various navigational routines.
Wehner, Rüdiger; Hoinville, Thierry; Cruse, Holk; Cheng, Ken
2016-07-01
A number of systems of navigation have been studied in some detail in insects. These include path integration, a system that keeps track of the straight-line distance and direction travelled on the current trip, the use of panoramic landmarks and scenery for orientation, and systematic searching. A traditional view is that only one navigational system is in operation at any one time, with different systems running in sequence depending on the context and conditions. We review selected data suggesting that often, different navigational cues (e.g., compass cues) and different systems of navigation are in operation simultaneously in desert ant navigation. The evidence suggests that all systems operate in parallel forming a heterarchical network. External and internal conditions determine the weights to be accorded to each cue and system. We also show that a model of independent modules feeding into a central summating device, the Navinet model, can in principle account for such data. No central executive processor is necessary aside from a weighted summation of the different cues and systems. Such a heterarchy of parallel systems all in operation represents a new view of insect navigation that has already been expressed informally by some authors.
DOT National Transportation Integrated Search
2014-02-01
The purpose of this memorandum is to provide recommended Total System Error (TSE) models : for aircraft using RNAV (GPS) guidance when analyzing the wake encounter risk of proposed : simultaneous dependent (paired) approach operations to Closel...
Work at the Uddevalla Volvo Plant from the Perspective of the Demand-Control Model
ERIC Educational Resources Information Center
Lottridge, Danielle
2004-01-01
The Uddevalla Volvo plant represents a different paradigm for automotive assembly. In parallel-flow work, self-managed work groups assemble entire automobiles with comparable productivity as conventional series-flow assembly lines. From the perspective of the demand-control model, operators at the Uddevalla plant have low physical and timing…
An efficient implementation of a high-order filter for a cubed-sphere spectral element model
NASA Astrophysics Data System (ADS)
Kang, Hyun-Gyu; Cheong, Hyeong-Bin
2017-03-01
A parallel-scalable, isotropic, scale-selective spatial filter was developed for the cubed-sphere spectral element model on the sphere. The filter equation is a high-order elliptic (Helmholtz) equation based on the spherical Laplacian operator, which is transformed into cubed-sphere local coordinates. The Laplacian operator is discretized on the computational domain, i.e., on each cell, by the spectral element method with Gauss-Lobatto Lagrange interpolating polynomials (GLLIPs) as the orthogonal basis functions. On the global domain, the discrete filter equation yielded a linear system represented by a highly sparse matrix. The density of this matrix increases quadratically (linearly) with the order of GLLIP (order of the filter), and the linear system is solved in only O (Ng) operations, where Ng is the total number of grid points. The solution, obtained by a row reduction method, demonstrated the typical accuracy and convergence rate of the cubed-sphere spectral element method. To achieve computational efficiency on parallel computers, the linear system was treated by an inverse matrix method (a sparse matrix-vector multiplication). The density of the inverse matrix was lowered to only a few times of the original sparse matrix without degrading the accuracy of the solution. For better computational efficiency, a local-domain high-order filter was introduced: The filter equation is applied to multiple cells, and then the central cell was only used to reconstruct the filtered field. The parallel efficiency of applying the inverse matrix method to the global- and local-domain filter was evaluated by the scalability on a distributed-memory parallel computer. The scale-selective performance of the filter was demonstrated on Earth topography. The usefulness of the filter as a hyper-viscosity for the vorticity equation was also demonstrated.
Brian Hears: Online Auditory Processing Using Vectorization Over Channels
Fontaine, Bertrand; Goodman, Dan F. M.; Benichoux, Victor; Brette, Romain
2011-01-01
The human cochlea includes about 3000 inner hair cells which filter sounds at frequencies between 20 Hz and 20 kHz. This massively parallel frequency analysis is reflected in models of auditory processing, which are often based on banks of filters. However, existing implementations do not exploit this parallelism. Here we propose algorithms to simulate these models by vectorizing computation over frequency channels, which are implemented in “Brian Hears,” a library for the spiking neural network simulator package “Brian.” This approach allows us to use high-level programming languages such as Python, because with vectorized operations, the computational cost of interpretation represents a small fraction of the total cost. This makes it possible to define and simulate complex models in a simple way, while all previous implementations were model-specific. In addition, we show that these algorithms can be naturally parallelized using graphics processing units, yielding substantial speed improvements. We demonstrate these algorithms with several state-of-the-art cochlear models, and show that they compare favorably with existing, less flexible, implementations. PMID:21811453
High subsonic flow tests of a parallel pipe followed by a large area ratio diffuser
NASA Technical Reports Server (NTRS)
Barna, P. S.
1975-01-01
Experiments were performed on a pilot model duct system in order to explore its aerodynamic characteristics. The model was scaled from a design projected for the high speed operation mode of the Aircraft Noise Reduction Laboratory. The test results show that the model performed satisfactorily and therefore the projected design will most likely meet the specifications.
Giménez, Beatriz; Özcan, Mutlu; Martínez-Rus, Francisco; Pradíes, Guillermo
2014-01-01
To evaluate the accuracy of a digital impression system based on parallel confocal red laser technology, taking into consideration clinical parameters such as operator experience and angulation and depth of implants. A maxillary master model with six implants (located bilaterally in the second molar, second premolar, and lateral incisor positions) was fitted with six polyether ether ketone scan bodies. One second premolar implant was placed with 30 degrees of mesial angulation; the opposite implant was positioned with 30 degrees of distal angulation. The lateral incisor implants were placed 2 or 4 mm subgingivally. Two experienced and two inexperienced operators performed intraoral scanning. Five different interimplant distances were then measured. The files obtained from the scans were imported with reverse-engineering software. Measurements were then made with a coordinate measurement machine, with values from the master model used as reference values. The deviations from the actual values were then calculated. The differences between experienced and inexperienced operators and the effects of different implant angulations and depths were compared statistically. Overall, operator 3 obtained significantly less accurate results. The angulated implants did not significantly influence accuracy compared to the parallel implants. Differences were found in the amount of error in the different quadrants. The second scanned quadrant had significantly worse results than the first scanned quadrant. Impressions of the implants placed at the tissue level were less accurate than implants placed 2 and 4 mm subgingivally. The operator affected the accuracy of measurements, but the performance of the operator was not necessarily dependent on experience. Angulated implants did not decrease the accuracy of the digital impression system tested. The scanned distance affected the predictability of the accuracy of the scanner, and the error increased with the increased length of the scanned section.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lewis, M.; Grimshaw, A.
1996-12-31
The Legion project at the University of Virginia is an architecture for designing and building system services that provide the illusion of a single virtual machine to users, a virtual machine that provides secure shared object and shared name spaces, application adjustable fault-tolerance, improved response time, and greater throughput. Legion targets wide area assemblies of workstations, supercomputers, and parallel supercomputers, Legion tackles problems not solved by existing workstation based parallel processing tools; the system will enable fault-tolerance, wide area parallel processing, inter-operability, heterogeneity, a single global name space, protection, security, efficient scheduling, and comprehensive resource management. This paper describes themore » core Legion object model, which specifies the composition and functionality of Legion`s core objects-those objects that cooperate to create, locate, manage, and remove objects in the Legion system. The object model facilitates a flexible extensible implementation, provides a single global name space, grants site autonomy to participating organizations, and scales to millions of sites and trillions of objects.« less
Archer, Charles J; Blocksome, Michael E; Ratterman, Joseph D; Smith, Brian E
2014-02-11
Endpoint-based parallel data processing in a parallel active messaging interface ('PAMI') of a parallel computer, the PAMI composed of data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes coupled for data communications through the PAMI, including establishing a data communications geometry, the geometry specifying, for tasks representing processes of execution of the parallel application, a set of endpoints that are used in collective operations of the PAMI including a plurality of endpoints for one of the tasks; receiving in endpoints of the geometry an instruction for a collective operation; and executing the instruction for a collective opeartion through the endpoints in dependence upon the geometry, including dividing data communications operations among the plurality of endpoints for one of the tasks.
The AIS-5000 parallel processor
DOE Office of Scientific and Technical Information (OSTI.GOV)
Schmitt, L.A.; Wilson, S.S.
1988-05-01
The AIS-5000 is a commercially available massively parallel processor which has been designed to operate in an industrial environment. It has fine-grained parallelism with up to 1024 processing elements arranged in a single-instruction multiple-data (SIMD) architecture. The processing elements are arranged in a one-dimensional chain that, for computer vision applications, can be as wide as the image itself. This architecture has superior cost/performance characteristics than two-dimensional mesh-connected systems. The design of the processing elements and their interconnections as well as the software used to program the system allow a wide variety of algorithms and applications to be implemented. In thismore » paper, the overall architecture of the system is described. Various components of the system are discussed, including details of the processing elements, data I/O pathways and parallel memory organization. A virtual two-dimensional model for programming image-based algorithms for the system is presented. This model is supported by the AIS-5000 hardware and software and allows the system to be treated as a full-image-size, two-dimensional, mesh-connected parallel processor. Performance bench marks are given for certain simple and complex functions.« less
GraphReduce: Large-Scale Graph Analytics on Accelerator-Based HPC Systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sengupta, Dipanjan; Agarwal, Kapil; Song, Shuaiwen
2015-09-30
Recent work on real-world graph analytics has sought to leverage the massive amount of parallelism offered by GPU devices, but challenges remain due to the inherent irregularity of graph algorithms and limitations in GPU-resident memory for storing large graphs. We present GraphReduce, a highly efficient and scalable GPU-based framework that operates on graphs that exceed the device’s internal memory capacity. GraphReduce adopts a combination of both edge- and vertex-centric implementations of the Gather-Apply-Scatter programming model and operates on multiple asynchronous GPU streams to fully exploit the high degrees of parallelism in GPUs with efficient graph data movement between the hostmore » and the device.« less
NASA Technical Reports Server (NTRS)
Simpkin, W. E.
1982-01-01
An approximately 0.25 scale model of the transition section of a tandem fan variable cycle engine nacelle was tested in the NASA Lewis Research Center 10-by-10 foot wind tunnel. Two 12-inch, tip-turbine driven fans were used to simulate a tandem fan engine. Three testing modes simulated a V/STOL tandem fan airplane. Parallel mode has two separate propulsion streams for maximum low speed performance. A front inlet, fan, and downward vectorable nozzle forms one stream. An auxilliary top inlet provides air to the aft fan - supplying the core engine and aft vectorable nozzle. Front nozzle and top inlet closure, and removal of a blocker door separating the two streams configures the tandem fan for series mode operations as a typical aircraft propulsion system. Transition mode operation is formed by intermediate settings of the front nozzle, blocker door, and top inlet. Emphasis was on the total pressure recovery and flow distortion at the aft fan face. A range of fan flow rates were tested at tunnel airspeeds from 0 to 240 knots, and angles-of-attack from -10 to 40 deg for all three modes. In addition to the model variables for the three modes, model variants of the top inlet were tested in the parallel mode only. These lip variables were: aft lip boundary layer bleed holes, and Three position turning vane. Also a bellmouth extension of the top inlet side lips was tested in parallel mode.
Research of influence of open-winding faults on properties of brushless permanent magnets motor
NASA Astrophysics Data System (ADS)
Bogusz, Piotr; Korkosz, Mariusz; Powrózek, Adam; Prokop, Jan; Wygonik, Piotr
2017-12-01
The paper presents an analysis of influence of selected fault states on properties of brushless DC motor with permanent magnets. The subject of study was a BLDC motor designed by the authors for unmanned aerial vehicle hybrid drive. Four parallel branches per each phase were provided in the discussed 3-phase motor. After open-winding fault in single or few parallel branches, a further operation of the motor can be continued. Waveforms of currents, voltages and electromagnetic torque were determined in discussed fault states based on the developed mathematical and simulation models. Laboratory test results concerning an influence of open-windings faults in parallel branches on properties of BLDC motor were presented.
Blob dynamics in TORPEX poloidal null configurations
NASA Astrophysics Data System (ADS)
Shanahan, B. W.; Dudson, B. D.
2016-12-01
3D blob dynamics are simulated in X-point magnetic configurations in the TORPEX device via a non-field-aligned coordinate system, using an isothermal model which evolves density, vorticity, parallel velocity and parallel current density. By modifying the parallel gradient operator to include perpendicular perturbations from poloidal field coils, numerical singularities associated with field aligned coordinates are avoided. A comparison with a previously developed analytical model (Avino 2016 Phys. Rev. Lett. 116 105001) is performed and an agreement is found with minimal modification. Experimental comparison determines that the null region can cause an acceleration of filaments due to increasing connection length, but this acceleration is small relative to other effects, which we quantify. Experimental measurements (Avino 2016 Phys. Rev. Lett. 116 105001) are reproduced, and the dominant acceleration mechanism is identified as that of a developing dipole in a moving background. Contributions from increasing connection length close to the null point are a small correction.
NASA Astrophysics Data System (ADS)
Lawry, B. J.; Encarnacao, A.; Hipp, J. R.; Chang, M.; Young, C. J.
2011-12-01
With the rapid growth of multi-core computing hardware, it is now possible for scientific researchers to run complex, computationally intensive software on affordable, in-house commodity hardware. Multi-core CPUs (Central Processing Unit) and GPUs (Graphics Processing Unit) are now commonplace in desktops and servers. Developers today have access to extremely powerful hardware that enables the execution of software that could previously only be run on expensive, massively-parallel systems. It is no longer cost-prohibitive for an institution to build a parallel computing cluster consisting of commodity multi-core servers. In recent years, our research team has developed a distributed, multi-core computing system and used it to construct global 3D earth models using seismic tomography. Traditionally, computational limitations forced certain assumptions and shortcuts in the calculation of tomographic models; however, with the recent rapid growth in computational hardware including faster CPU's, increased RAM, and the development of multi-core computers, we are now able to perform seismic tomography, 3D ray tracing and seismic event location using distributed parallel algorithms running on commodity hardware, thereby eliminating the need for many of these shortcuts. We describe Node Resource Manager (NRM), a system we developed that leverages the capabilities of a parallel computing cluster. NRM is a software-based parallel computing management framework that works in tandem with the Java Parallel Processing Framework (JPPF, http://www.jppf.org/), a third party library that provides a flexible and innovative way to take advantage of modern multi-core hardware. NRM enables multiple applications to use and share a common set of networked computers, regardless of their hardware platform or operating system. Using NRM, algorithms can be parallelized to run on multiple processing cores of a distributed computing cluster of servers and desktops, which results in a dramatic speedup in execution time. NRM is sufficiently generic to support applications in any domain, as long as the application is parallelizable (i.e., can be subdivided into multiple individual processing tasks). At present, NRM has been effective in decreasing the overall runtime of several algorithms: 1) the generation of a global 3D model of the compressional velocity distribution in the Earth using tomographic inversion, 2) the calculation of the model resolution matrix, model covariance matrix, and travel time uncertainty for the aforementioned velocity model, and 3) the correlation of waveforms with archival data on a massive scale for seismic event detection. Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy's National Nuclear Security Administration under contract DE-AC04-94AL85000.
NASA Astrophysics Data System (ADS)
Liu, Wei; Li, Ying-jun; Jia, Zhen-yuan; Zhang, Jun; Qian, Min
2011-01-01
In working process of huge heavy-load manipulators, such as the free forging machine, hydraulic die-forging press, forging manipulator, heavy grasping manipulator, large displacement manipulator, measurement of six-dimensional heavy force/torque and real-time force feedback of the operation interface are basis to realize coordinate operation control and force compliance control. It is also an effective way to raise the control accuracy and achieve highly efficient manufacturing. Facing to solve dynamic measurement problem on six-dimensional time-varying heavy load in extremely manufacturing process, the novel principle of parallel load sharing on six-dimensional heavy force/torque is put forward. The measuring principle of six-dimensional force sensor is analyzed, and the spatial model is built and decoupled. The load sharing ratios are analyzed and calculated in vertical and horizontal directions. The mapping relationship between six-dimensional heavy force/torque value to be measured and output force value is built. The finite element model of parallel piezoelectric six-dimensional heavy force/torque sensor is set up, and its static characteristics are analyzed by ANSYS software. The main parameters, which affect load sharing ratio, are analyzed. The experiments for load sharing with different diameters of parallel axis are designed. The results show that the six-dimensional heavy force/torque sensor has good linearity. Non-linearity errors are less than 1%. The parallel axis makes good effect of load sharing. The larger the diameter is, the better the load sharing effect is. The results of experiments are in accordance with the FEM analysis. The sensor has advantages of large measuring range, good linearity, high inherent frequency, and high rigidity. It can be widely used in extreme environments for real-time accurate measurement of six-dimensional time-varying huge loads on manipulators.
SKIRT: Hybrid parallelization of radiative transfer simulations
NASA Astrophysics Data System (ADS)
Verstocken, S.; Van De Putte, D.; Camps, P.; Baes, M.
2017-07-01
We describe the design, implementation and performance of the new hybrid parallelization scheme in our Monte Carlo radiative transfer code SKIRT, which has been used extensively for modelling the continuum radiation of dusty astrophysical systems including late-type galaxies and dusty tori. The hybrid scheme combines distributed memory parallelization, using the standard Message Passing Interface (MPI) to communicate between processes, and shared memory parallelization, providing multiple execution threads within each process to avoid duplication of data structures. The synchronization between multiple threads is accomplished through atomic operations without high-level locking (also called lock-free programming). This improves the scaling behaviour of the code and substantially simplifies the implementation of the hybrid scheme. The result is an extremely flexible solution that adjusts to the number of available nodes, processors and memory, and consequently performs well on a wide variety of computing architectures.
Synthesizing parallel imaging applications using the CAP (computer-aided parallelization) tool
NASA Astrophysics Data System (ADS)
Gennart, Benoit A.; Mazzariol, Marc; Messerli, Vincent; Hersch, Roger D.
1997-12-01
Imaging applications such as filtering, image transforms and compression/decompression require vast amounts of computing power when applied to large data sets. These applications would potentially benefit from the use of parallel processing. However, dedicated parallel computers are expensive and their processing power per node lags behind that of the most recent commodity components. Furthermore, developing parallel applications remains a difficult task: writing and debugging the application is difficult (deadlocks), programs may not be portable from one parallel architecture to the other, and performance often comes short of expectations. In order to facilitate the development of parallel applications, we propose the CAP computer-aided parallelization tool which enables application programmers to specify at a high-level of abstraction the flow of data between pipelined-parallel operations. In addition, the CAP tool supports the programmer in developing parallel imaging and storage operations. CAP enables combining efficiently parallel storage access routines and image processing sequential operations. This paper shows how processing and I/O intensive imaging applications must be implemented to take advantage of parallelism and pipelining between data access and processing. This paper's contribution is (1) to show how such implementations can be compactly specified in CAP, and (2) to demonstrate that CAP specified applications achieve the performance of custom parallel code. The paper analyzes theoretically the performance of CAP specified applications and demonstrates the accuracy of the theoretical analysis through experimental measurements.
Control and protection system for paralleled modular static inverter-converter systems
NASA Technical Reports Server (NTRS)
Birchenough, A. G.; Gourash, F.
1973-01-01
A control and protection system was developed for use with a paralleled 2.5-kWe-per-module static inverter-converter system. The control and protection system senses internal and external fault parameters such as voltage, frequency, current, and paralleling current unbalance. A logic system controls contactors to isolate defective power conditioners or loads. The system sequences contactor operation to automatically control parallel operation, startup, and fault isolation. Transient overload protection and fault checking sequences are included. The operation and performance of a control and protection system, with detailed circuit descriptions, are presented.
Basic research needed for stimulating the development of behavioral technologies
Mace, F. Charles
1994-01-01
The costs of disconnection between the basic and applied sectors of behavior analysis are reviewed, and some solutions to these problems are proposed. Central to these solutions are collaborations between basic and applied behavioral scientists in programmatic research that addresses the behavioral basis and solution of human behavior problems. This kind of collaboration parallels the deliberate interactions between basic and applied researchers that have proven to be so profitable in other scientific fields, such as medicine. Basic research questions of particular relevance to the development of behavioral technologies are posed in the following areas: response allocation, resistance to change, countercontrol, formation and differentiation/discrimination of stimulus and response classes, analysis of low-rate behavior, and rule-governed behavior. Three interrelated strategies to build connections between the basic and applied analysis of behavior are identified: (a) the development of nonhuman animal models of human behavior problems using operations that parallel plausible human circumstances, (b) replication of the modeled relations with human subjects in the operant laboratory, and (c) tests of the generality of the model with actual human problems in natural settings. PMID:16812734
Efficient Thread Labeling for Monitoring Programs with Nested Parallelism
NASA Astrophysics Data System (ADS)
Ha, Ok-Kyoon; Kim, Sun-Sook; Jun, Yong-Kee
It is difficult and cumbersome to detect data races occurred in an execution of parallel programs. Any on-the-fly race detection techniques using Lamport's happened-before relation needs a thread labeling scheme for generating unique identifiers which maintain logical concurrency information for the parallel threads. NR labeling is an efficient thread labeling scheme for the fork-join program model with nested parallelism, because its efficiency depends only on the nesting depth for every fork and join operation. This paper presents an improved NR labeling, called e-NR labeling, in which every thread generates its label by inheriting the pointer to its ancestor list from the parent threads or by updating the pointer in a constant amount of time and space. This labeling is more efficient than the NR labeling, because its efficiency does not depend on the nesting depth for every fork and join operation. Some experiments were performed with OpenMP programs having nesting depths of three or four and maximum parallelisms varying from 10,000 to 1,000,000. The results show that e-NR is 5 times faster than NR labeling and 4.3 times faster than OS labeling in the average time for creating and maintaining the thread labels. In average space required for labeling, it is 3.5 times smaller than NR labeling and 3 times smaller than OS labeling.
Ouellet, Jean A.; Richards, Corey; Sardar, Zeeshan M.; Giannitsios, Demetri; Noiseux, Nicholas; Strydom, Willem S.; Reindl, Rudy; Jarzem, Peter; Arlet, Vincent; Steffen, Thomas
2013-01-01
The ideal treatment for unstable thoracolumbar fractures remains controversial with posterior reduction and stabilization, anterior reduction and stabilization, combined posterior and anterior reduction and stabilization, and even nonoperative management advocated. Short segment posterior osteosynthesis of these fractures has less comorbidities compared with the other operative approaches but settles into kyphosis over time. Biomechanical comparison of the divergent bridge construct versus the parallel tension band construct was performed for anteriorly destabilized T11–L1 spine segments using three different models: (1) finite element analysis (FEA), (2) a synthetic model, and (3) a human cadaveric model. Outcomes measured were construct stiffness and ultimate failure load. Our objective was to determine if the divergent pedicle screw bridge construct would provide more resistance to kyphotic deforming forces. All three modalities showed greater stiffness with the divergent bridge construct. The FEA calculated a stiffness of 21.6 N/m for the tension band construct versus 34.1 N/m for the divergent bridge construct. The synthetic model resulted in a mean stiffness of 17.3 N/m for parallel tension band versus 20.6 N/m for the divergent bridge (p = 0.03), whereas the cadaveric model had an average stiffness of 15.2 N/m in the parallel tension band compared with 18.4 N/m for the divergent bridge (p = 0.02). Ultimate failure load with the cadaveric model was found to be 622 N for the divergent bridge construct versus 419 N (p = 0.15) for the parallel tension band construct. This study confirms our clinical experience that the short posterior divergent bridge construct provides greater stiffness for the management of unstable thoracolumbar fractures. PMID:24436856
First Applications of the New Parallel Krylov Solver for MODFLOW on a National and Global Scale
NASA Astrophysics Data System (ADS)
Verkaik, J.; Hughes, J. D.; Sutanudjaja, E.; van Walsum, P.
2016-12-01
Integrated high-resolution hydrologic models are increasingly being used for evaluating water management measures at field scale. Their drawbacks are large memory requirements and long run times. Examples of such models are The Netherlands Hydrological Instrument (NHI) model and the PCRaster Global Water Balance (PCR-GLOBWB) model. Typical simulation periods are 30-100 years with daily timesteps. The NHI model predicts water demands in periods of drought, supporting operational and long-term water-supply decisions. The NHI is a state-of-the-art coupling of several models: a 7-layer MODFLOW groundwater model ( 6.5M 250m cells), a MetaSWAP model for the unsaturated zone (Richards emulator of 0.5M cells), and a surface water model (MOZART-DM). The PCR-GLOBWB model provides a grid-based representation of global terrestrial hydrology and this work uses the version that includes a 2-layer MODFLOW groundwater model ( 4.5M 10km cells). The Parallel Krylov Solver (PKS) speeds up computation by both distributed memory parallelization (Message Passing Interface) and shared memory parallelization (Open Multi-Processing). PKS includes conjugate gradient, bi-conjugate gradient stabilized, and generalized minimal residual linear accelerators that use an overlapping additive Schwarz domain decomposition preconditioner. PKS can be used for both structured and unstructured grids and has been fully integrated in MODFLOW-USG using METIS partitioning and in iMODFLOW using RCB partitioning. iMODFLOW is an accelerated version of MODFLOW-2005 that is implicitly and online coupled to MetaSWAP. Results for benchmarks carried out on the Cartesius Dutch supercomputer (https://userinfo.surfsara.nl/systems/cartesius) for the PCRGLOB-WB model and on a 2x16 core Windows machine for the NHI model show speedups up to 10-20 and 5-10, respectively.
Northern Arabian Sea Circulation - Autonomous Research: Optimal Planning Systems (NASCar-OPS)
2015-09-30
vehicles ( gliders , drifters, floats, and/or wave- gliders ) - Provide guidance for persistent optimal sampling, including for long-duration observation...headings and relative operating speeds will be provided to the operational fleets of instruments and vehicles (e.g. gliders , drifters, floats or wave... gliders ). We plan to use models specific to vehicle types (floats, wave- gliders , etc.). We also plan to further parallelize and optimize our codes
Parallel Algorithms and Patterns
DOE Office of Scientific and Technical Information (OSTI.GOV)
Robey, Robert W.
2016-06-16
This is a powerpoint presentation on parallel algorithms and patterns. A parallel algorithm is a well-defined, step-by-step computational procedure that emphasizes concurrency to solve a problem. Examples of problems include: Sorting, searching, optimization, matrix operations. A parallel pattern is a computational step in a sequence of independent, potentially concurrent operations that occurs in diverse scenarios with some frequency. Examples are: Reductions, prefix scans, ghost cell updates. We only touch on parallel patterns in this presentation. It really deserves its own detailed discussion which Gabe Rockefeller would like to develop.
Optimization of a new flow design for solid oxide cells using computational fluid dynamics modelling
NASA Astrophysics Data System (ADS)
Duhn, Jakob Dragsbæk; Jensen, Anker Degn; Wedel, Stig; Wix, Christian
2016-12-01
Design of a gas distributor to distribute gas flow into parallel channels for Solid Oxide Cells (SOC) is optimized, with respect to flow distribution, using Computational Fluid Dynamics (CFD) modelling. The CFD model is based on a 3d geometric model and the optimized structural parameters include the width of the channels in the gas distributor and the area in front of the parallel channels. The flow of the optimized design is found to have a flow uniformity index value of 0.978. The effects of deviations from the assumptions used in the modelling (isothermal and non-reacting flow) are evaluated and it is found that a temperature gradient along the parallel channels does not affect the flow uniformity, whereas a temperature difference between the channels does. The impact of the flow distribution on the maximum obtainable conversion during operation is also investigated and the obtainable overall conversion is found to be directly proportional to the flow uniformity. Finally the effect of manufacturing errors is investigated. The design is shown to be robust towards deviations from design dimensions of at least ±0.1 mm which is well within obtainable tolerances.
Automatic recognition of vector and parallel operations in a higher level language
NASA Technical Reports Server (NTRS)
Schneck, P. B.
1971-01-01
A compiler for recognizing statements of a FORTRAN program which are suited for fast execution on a parallel or pipeline machine such as Illiac-4, Star or ASC is described. The technique employs interval analysis to provide flow information to the vector/parallel recognizer. Where profitable the compiler changes scalar variables to subscripted variables. The output of the compiler is an extension to FORTRAN which shows parallel and vector operations explicitly.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Archer, Charles J; Blocksome, Michael A; Cernohous, Bob R
Endpoint-based parallel data processing with non-blocking collective instructions in a PAMI of a parallel computer is disclosed. The PAMI is composed of data communications endpoints, each including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task. The compute nodes are coupled for data communications through the PAMI. The parallel application establishes a data communications geometry specifying a set of endpoints that are used in collective operations of the PAMI by associating with the geometry a list of collective algorithms valid for use with themore » endpoints of the geometry; registering in each endpoint in the geometry a dispatch callback function for a collective operation; and executing without blocking, through a single one of the endpoints in the geometry, an instruction for the collective operation.« less
Xenon Formal Security Policy Model
2007-08-14
munication primitives such as locks or semaphores , machine instruction results, hypercall results, traps, and interrupts. For an informal example...communicated on the corresponding side of the parallel oper- ator. Events that are in X ∪ Y are synchronized over the two processes. So if we define
Toward Petascale Biologically Plausible Neural Networks
NASA Astrophysics Data System (ADS)
Long, Lyle
This talk will describe an approach to achieving petascale neural networks. Artificial intelligence has been oversold for many decades. Computers in the beginning could only do about 16,000 operations per second. Computer processing power, however, has been doubling every two years thanks to Moore's law, and growing even faster due to massively parallel architectures. Finally, 60 years after the first AI conference we have computers on the order of the performance of the human brain (1016 operations per second). The main issues now are algorithms, software, and learning. We have excellent models of neurons, such as the Hodgkin-Huxley model, but we do not know how the human neurons are wired together. With careful attention to efficient parallel computing, event-driven programming, table lookups, and memory minimization massive scale simulations can be performed. The code that will be described was written in C + + and uses the Message Passing Interface (MPI). It uses the full Hodgkin-Huxley neuron model, not a simplified model. It also allows arbitrary network structures (deep, recurrent, convolutional, all-to-all, etc.). The code is scalable, and has, so far, been tested on up to 2,048 processor cores using 107 neurons and 109 synapses.
Modified Petri net model sensitivity to workload manipulations
NASA Technical Reports Server (NTRS)
White, S. A.; Mackinnon, D. P.; Lyman, J.
1986-01-01
Modified Petri Nets (MPNs) are investigated as a workload modeling tool. The results of an exploratory study of the sensitivity of MPNs to work load manipulations in a dual task are described. Petri nets have been used to represent systems with asynchronous, concurrent and parallel activities (Peterson, 1981). These characteristics led some researchers to suggest the use of Petri nets in workload modeling where concurrent and parallel activities are common. Petri nets are represented by places and transitions. In the workload application, places represent operator activities and transitions represent events. MPNs have been used to formally represent task events and activities of a human operator in a man-machine system. Some descriptive applications demonstrate the usefulness of MPNs in the formal representation of systems. It is the general hypothesis herein that in addition to descriptive applications, MPNs may be useful for workload estimation and prediction. The results are reported of the first of a series of experiments designed to develop and test a MPN system of workload estimation and prediction. This first experiment is a screening test of MPN model general sensitivity to changes in workload. Positive results from this experiment will justify the more complicated analyses and techniques necessary for developing a workload prediction system.
Parallel operation of NH3 screw compressors - the optimum way
NASA Astrophysics Data System (ADS)
Pijnenburg, B.; Ritmann, J.
2015-08-01
The use of more smaller industrial NH3 screw compressors operating in parallel seems to offer the optimum way when it comes to fulfilling maximum part load efficiency, increased redundancy and other highly requested features in the industrial refrigeration industry today. Parallel operation in an optimum way can be selected to secure continuous operation and can in most applications be configured to ensure lower overall operating economy. New compressors are developed to meet requirements for flexibility in operation and are controlled in an intelligent way. The intelligent control system keeps focus on all external demands, but yet striving to offer always the lowest possible absorbed power, including in future scenarios with connection to smart grid.
NASA Astrophysics Data System (ADS)
Frickenhaus, Stephan; Hiller, Wolfgang; Best, Meike
The portable software FoSSI is introduced that—in combination with additional free solver software packages—allows for an efficient and scalable parallel solution of large sparse linear equations systems arising in finite element model codes. FoSSI is intended to support rapid model code development, completely hiding the complexity of the underlying solver packages. In particular, the model developer need not be an expert in parallelization and is yet free to switch between different solver packages by simple modifications of the interface call. FoSSI offers an efficient and easy, yet flexible interface to several parallel solvers, most of them available on the web, such as PETSC, AZTEC, MUMPS, PILUT and HYPRE. FoSSI makes use of the concept of handles for vectors, matrices, preconditioners and solvers, that is frequently used in solver libraries. Hence, FoSSI allows for a flexible treatment of several linear equations systems and associated preconditioners at the same time, even in parallel on separate MPI-communicators. The second special feature in FoSSI is the task specifier, being a combination of keywords, each configuring a certain phase in the solver setup. This enables the user to control a solver over one unique subroutine. Furthermore, FoSSI has rather similar features for all solvers, making a fast solver intercomparison or exchange an easy task. FoSSI is a community software, proven in an adaptive 2D-atmosphere model and a 3D-primitive equation ocean model, both formulated in finite elements. The present paper discusses perspectives of an OpenMP-implementation of parallel iterative solvers based on domain decomposition methods. This approach to OpenMP solvers is rather attractive, as the code for domain-local operations of factorization, preconditioning and matrix-vector product can be readily taken from a sequential implementation that is also suitable to be used in an MPI-variant. Code development in this direction is in an advanced state under the name ScOPES: the Scalable Open Parallel sparse linear Equations Solver.
SAPNEW: Parallel finite element code for thin shell structures on the Alliant FX/80
NASA Astrophysics Data System (ADS)
Kamat, Manohar P.; Watson, Brian C.
1992-02-01
The results of a research activity aimed at providing a finite element capability for analyzing turbo-machinery bladed-disk assemblies in a vector/parallel processing environment are summarized. Analysis of aircraft turbofan engines is very computationally intensive. The performance limit of modern day computers with a single processing unit was estimated at 3 billions of floating point operations per second (3 gigaflops). In view of this limit of a sequential unit, performance rates higher than 3 gigaflops can be achieved only through vectorization and/or parallelization as on Alliant FX/80. Accordingly, the efforts of this critically needed research were geared towards developing and evaluating parallel finite element methods for static and vibration analysis. A special purpose code, named with the acronym SAPNEW, performs static and eigen analysis of multi-degree-of-freedom blade models built-up from flat thin shell elements.
Parallel Adaptive Mesh Refinement Library
NASA Technical Reports Server (NTRS)
Mac-Neice, Peter; Olson, Kevin
2005-01-01
Parallel Adaptive Mesh Refinement Library (PARAMESH) is a package of Fortran 90 subroutines designed to provide a computer programmer with an easy route to extension of (1) a previously written serial code that uses a logically Cartesian structured mesh into (2) a parallel code with adaptive mesh refinement (AMR). Alternatively, in its simplest use, and with minimal effort, PARAMESH can operate as a domain-decomposition tool for users who want to parallelize their serial codes but who do not wish to utilize adaptivity. The package builds a hierarchy of sub-grids to cover the computational domain of a given application program, with spatial resolution varying to satisfy the demands of the application. The sub-grid blocks form the nodes of a tree data structure (a quad-tree in two or an oct-tree in three dimensions). Each grid block has a logically Cartesian mesh. The package supports one-, two- and three-dimensional models.
SAPNEW: Parallel finite element code for thin shell structures on the Alliant FX/80
NASA Technical Reports Server (NTRS)
Kamat, Manohar P.; Watson, Brian C.
1992-01-01
The results of a research activity aimed at providing a finite element capability for analyzing turbo-machinery bladed-disk assemblies in a vector/parallel processing environment are summarized. Analysis of aircraft turbofan engines is very computationally intensive. The performance limit of modern day computers with a single processing unit was estimated at 3 billions of floating point operations per second (3 gigaflops). In view of this limit of a sequential unit, performance rates higher than 3 gigaflops can be achieved only through vectorization and/or parallelization as on Alliant FX/80. Accordingly, the efforts of this critically needed research were geared towards developing and evaluating parallel finite element methods for static and vibration analysis. A special purpose code, named with the acronym SAPNEW, performs static and eigen analysis of multi-degree-of-freedom blade models built-up from flat thin shell elements.
Regularized wave equation migration for imaging and data reconstruction
NASA Astrophysics Data System (ADS)
Kaplan, Sam T.
The reflection seismic experiment results in a measurement (reflection seismic data) of the seismic wavefield. The linear Born approximation to the seismic wavefield leads to a forward modelling operator that we use to approximate reflection seismic data in terms of a scattering potential. We consider approximations to the scattering potential using two methods: the adjoint of the forward modelling operator (migration), and regularized numerical inversion using the forward and adjoint operators. We implement two parameterizations of the forward modelling and migration operators: source-receiver and shot-profile. For both parameterizations, we find requisite Green's function using the split-step approximation. We first develop the forward modelling operator, and then find the adjoint (migration) operator by recognizing a Fredholm integral equation of the first kind. The resulting numerical system is generally under-determined, requiring prior information to find a solution. In source-receiver migration, the parameterization of the scattering potential is understood using the migration imaging condition, and this encourages us to apply sparse prior models to the scattering potential. To that end, we use both a Cauchy prior and a mixed Cauchy-Gaussian prior, finding better resolved estimates of the scattering potential than are given by the adjoint. In shot-profile migration, the parameterization of the scattering potential has its redundancy in multiple active energy sources (i.e. shots). We find that a smallest model regularized inverse representation of the scattering potential gives a more resolved picture of the earth, as compared to the simpler adjoint representation. The shot-profile parameterization allows us to introduce a joint inversion to further improve the estimate of the scattering potential. Moreover, it allows us to introduce a novel data reconstruction algorithm so that limited data can be interpolated/extrapolated. The linearized operators are expensive, encouraging their parallel implementation. For the source-receiver parameterization of the scattering potential this parallelization is non-trivial. Seismic data is typically corrupted by various types of noise. Sparse coding can be used to suppress noise prior to migration. It is a method that stems from information theory and that we apply to noise suppression in seismic data.
Parallel-aware, dedicated job co-scheduling within/across symmetric multiprocessing nodes
Jones, Terry R.; Watson, Pythagoras C.; Tuel, William; Brenner, Larry; ,Caffrey, Patrick; Fier, Jeffrey
2010-10-05
In a parallel computing environment comprising a network of SMP nodes each having at least one processor, a parallel-aware co-scheduling method and system for improving the performance and scalability of a dedicated parallel job having synchronizing collective operations. The method and system uses a global co-scheduler and an operating system kernel dispatcher adapted to coordinate interfering system and daemon activities on a node and across nodes to promote intra-node and inter-node overlap of said interfering system and daemon activities as well as intra-node and inter-node overlap of said synchronizing collective operations. In this manner, the impact of random short-lived interruptions, such as timer-decrement processing and periodic daemon activity, on synchronizing collective operations is minimized on large processor-count SPMD bulk-synchronous programming styles.
Power combining in an array of microwave power rectifiers
NASA Technical Reports Server (NTRS)
Gutmann, R. J.; Borrego, J. M.
1979-01-01
This work analyzes the resultant efficiency degradation when identical rectifiers operate at different RF power levels as caused by the power beam taper. Both a closed-form analytical circuit model and a detailed computer-simulation model are used to obtain the output dc load line of the rectifier. The efficiency degradation is nearly identical with series and parallel combining, and the closed-form analytical model provides results which are similar to the detailed computer-simulation model.
NASA Technical Reports Server (NTRS)
Gore, Brian Francis; Hooey, Becky Lee; Haan, Nancy; Socash, Connie; Mahlstedt, Eric; Foyle, David C.
2013-01-01
The Closely Spaced Parallel Operations (CSPO) scenario is a complex, human performance model scenario that tested alternate operator roles and responsibilities to a series of off-nominal operations on approach and landing (see Gore, Hooey, Mahlstedt, Foyle, 2013). The model links together the procedures, equipment, crewstation, and external environment to produce predictions of operator performance in response to Next Generation system designs, like those expected in the National Airspaces NextGen concepts. The task analysis that is contained in the present report comes from the task analysis window in the MIDAS software. These tasks link definitions and states for equipment components, environmental features as well as operational contexts. The current task analysis culminated in 3300 tasks that included over 1000 Subject Matter Expert (SME)-vetted, re-usable procedural sets for three critical phases of flight; the Descent, Approach, and Land procedural sets (see Gore et al., 2011 for a description of the development of the tasks included in the model; Gore, Hooey, Mahlstedt, Foyle, 2013 for a description of the model, and its results; Hooey, Gore, Mahlstedt, Foyle, 2013 for a description of the guidelines that were generated from the models results; Gore, Hooey, Foyle, 2012 for a description of the models implementation and its settings). The rollout, after landing checks, taxi to gate and arrive at gate illustrated in Figure 1 were not used in the approach and divert scenarios exercised. The other networks in Figure 1 set up appropriate context settings for the flight deck.The current report presents the models task decomposition from the tophighest level and decomposes it to finer-grained levels. The first task that is completed by the model is to set all of the initial settings for the scenario runs included in the model (network 75 in Figure 1). This initialization process also resets the CAD graphic files contained with MIDAS, as well as the embedded operator models that comprise MIDAS. Following the initial settings, the model progresses to begin the first tasks required of the two flight deck operators, the Captain (CA) and the First Officer (FO). The task sets will initialize operator specific settings prior to loading all of the alerts, probes, and other events that occur in the scenario. As a note, the CA and FO were terms used in developing this model but the CA can also be thought of as the Pilot Flying (PF), while the FO can be considered the Pilot-Not-Flying (PNF)or Pilot Monitoring (PM). As such, the document refers to the operators as PFCA and PNFFO respectively.
Vectorized and multitasked solution of the few-group neutron diffusion equations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zee, S.K.; Turinsky, P.J.; Shayer, Z.
1989-03-01
A numerical algorithm with parallelism was used to solve the two-group, multidimensional neutron diffusion equations on computers characterized by shared memory, vector pipeline, and multi-CPU architecture features. Specifically, solutions were obtained on the Cray X/MP-48, the IBM-3090 with vector facilities, and the FPS-164. The material-centered mesh finite difference method approximation and outer-inner iteration method were employed. Parallelism was introduced in the inner iterations using the cyclic line successive overrelaxation iterative method and solving in parallel across lines. The outer iterations were completed using the Chebyshev semi-iterative method that allows parallelism to be introduced in both space and energy groups. Formore » the three-dimensional model, power, soluble boron, and transient fission product feedbacks were included. Concentrating on the pressurized water reactor (PWR), the thermal-hydraulic calculation of moderator density assumed single-phase flow and a closed flow channel, allowing parallelism to be introduced in the solution across the radial plane. Using a pinwise detail, quarter-core model of a typical PWR in cycle 1, for the two-dimensional model without feedback the measured million floating point operations per second (MFLOPS)/vector speedups were 83/11.7. 18/2.2, and 2.4/5.6 on the Cray, IBM, and FPS without multitasking, respectively. Lower performance was observed with a coarser mesh, i.e., shorter vector length, due to vector pipeline start-up. For an 18 x 18 x 30 (x-y-z) three-dimensional model with feedback of the same core, MFLOPS/vector speedups of --61/6.7 and an execution time of 0.8 CPU seconds on the Cray without multitasking were measured. Finally, using two CPUs and the vector pipelines of the Cray, a multitasking efficiency of 81% was noted for the three-dimensional model.« less
Miao Meng; Kiani, Mehdi
2016-08-01
In order to achieve efficient wireless power transmission (WPT) to biomedical implants with millimeter (mm) dimensions, ultrasonic WPT links have recently been proposed. Operating both transmitter (Tx) and receiver (Rx) ultrasonic transducers at their resonance frequency (fr) is key in improving power transmission efficiency (PTE). In this paper, different resonance configurations for Tx and Rx transducers, including series and parallel resonance, have been studied to help the designers of ultrasonic WPT links to choose the optimal resonance configuration for Tx and Rx that maximizes PTE. The geometries for disk-shaped transducers of four different sets of links, operating at series-series, series-parallel, parallel-series, and parallel-parallel resonance configurations in Tx and Rx, have been found through finite-element method (FEM) simulation tools for operation at fr of 1.4 MHz. Our simulation results suggest that operating the Tx transducer with parallel resonance increases PTE, while the resonance configuration of the mm-sized Rx transducer highly depends on the load resistance, Rl. For applications that involve large Rl in the order of tens of kΩ, a parallel resonance for a mm-sized Rx leads to higher PTE, while series resonance is preferred for Rl in the order of several kΩ and below.
2015-06-01
cient parallel code for applying the operator. Our method constructs a polynomial preconditioner using a nonlinear least squares (NLLS) algorithm. We show...apply the underlying operator. Such a preconditioner can be very attractive in scenarios where one has a highly efficient parallel code for applying...repeatedly solve a large system of linear equations where one has an extremely fast parallel code for applying an underlying fixed linear operator
Operation of high power converters in parallel
NASA Technical Reports Server (NTRS)
Decker, D. K.; Inouye, L. Y.
1993-01-01
High power converters that are used in space power subsystems are limited in power handling capability due to component and thermal limitations. For applications, such as Space Station Freedom, where multi-kilowatts of power must be delivered to user loads, parallel operation of converters becomes an attractive option when considering overall power subsystem topologies. TRW developed three different unequal power sharing approaches for parallel operation of converters. These approaches, known as droop, master-slave, and proportional adjustment, are discussed and test results are presented.
Compact modeling of CRS devices based on ECM cells for memory, logic and neuromorphic applications.
Linn, E; Menzel, S; Ferch, S; Waser, R
2013-09-27
Dynamic physics-based models of resistive switching devices are of great interest for the realization of complex circuits required for memory, logic and neuromorphic applications. Here, we apply such a model of an electrochemical metallization (ECM) cell to complementary resistive switches (CRSs), which are favorable devices to realize ultra-dense passive crossbar arrays. Since a CRS consists of two resistive switching devices, it is straightforward to apply the dynamic ECM model for CRS simulation with MATLAB and SPICE, enabling study of the device behavior in terms of sweep rate and series resistance variations. Furthermore, typical memory access operations as well as basic implication logic operations can be analyzed, revealing requirements for proper spike and level read operations. This basic understanding facilitates applications of massively parallel computing paradigms required for neuromorphic applications.
Human Resource Scheduling in Performing a Sequence of Discrete Responses
2009-02-28
each is a graph comparing simulated results of each respective model with data from Experiment 3b. As described below the parameters of the model...initiated in parallel with ongoing Central operations on another. To fix model parameters we estimated the range of times to perform the sum of the...standard deviation for each parameter was set to 50% of mean value. Initial simulations found no meaningful differences between setting the standard
Nonlinear and Dissipation Characteristics of Ocean Surface Waves in Estuarine Environments
2013-09-30
developed models while using the general framework of operational wave models. We will conduct robustness tests of the system to determine the...and Guza (1984) model is weakly dispersive, in line with the assumptions behind the Boussinesq equations from which it was derived. The Kaihatu and...interactions across both frequency and directions. This system of equations is solved over a 2D frequency (f) and shore parallel wave number (κ) space. The
Operator assistant to support deep space network link monitor and control
NASA Technical Reports Server (NTRS)
Cooper, Lynne P.; Desai, Rajiv; Martinez, Elmain
1992-01-01
Preparing the Deep Space Network (DSN) stations to support spacecraft missions (referred to as pre-cal, for pre-calibration) is currently an operator and time intensive activity. Operators are responsible for sending and monitoring several hundred operator directivities, messages, and warnings. Operator directives are used to configure and calibrate the various subsystems (antenna, receiver, etc.) necessary to establish a spacecraft link. Messages and warnings are issued by the subsystems upon completion of an operation, changes of status, or an anomalous condition. Some points of pre-cal are logically parallel. Significant time savings could be realized if the existing Link Monitor and Control system (LMC) could support the operator in exploiting the parallelism inherent in pre-cal activities. Currently, operators may work on the individual subsystems in parallel, however, the burden of monitoring these parallel operations resides solely with the operator. Messages, warnings, and directives are all presented as they are received; without being correlated to the event that triggered them. Pre-cal is essentially an overhead activity. During pre-cal, no mission is supported, and no other activity can be performed using the equipment in the link. Therefore, it is highly desirable to reduce pre-cal time as much as possible. One approach to do this, as well as to increase efficiency and reduce errors, is the LMC Operator Assistant (OA). The LMC OA prototype demonstrates an architecture which can be used in concert with the existing LMC to exploit parallelism in pre-cal operations while providing the operators with a true monitoring capability, situational awareness and positive control. This paper presents an overview of the LMC OA architecture and the results from initial prototyping and test activities.
Development of Tokamak Transport Solvers for Stiff Confinement Systems
NASA Astrophysics Data System (ADS)
St. John, H. E.; Lao, L. L.; Murakami, M.; Park, J. M.
2006-10-01
Leading transport models such as GLF23 [1] and MM95 [2] describe turbulent plasma energy, momentum and particle flows. In order to accommodate existing transport codes and associated solution methods effective diffusivities have to be derived from these turbulent flow models. This can cause significant problems in predicting unique solutions. We have developed a parallel transport code solver, GCNMP, that can accommodate both flow based and diffusivity based confinement models by solving the discretized nonlinear equations using modern Newton, trust region, steepest descent and homotopy methods. We present our latest development efforts, including multiple dynamic grids, application of two-level parallel schemes, and operator splitting techniques that allow us to combine flow based and diffusivity based models in tokamk simulations. 6pt [1] R.E. Waltz, et al., Phys. Plasmas 4, 7 (1997). [2] G. Bateman, et al., Phys. Plasmas 5, 1793 (1998).
Injector Design Tool Improvements: User's manual for FDNS V.4.5
NASA Technical Reports Server (NTRS)
Chen, Yen-Sen; Shang, Huan-Min; Wei, Hong; Liu, Jiwen
1998-01-01
The major emphasis of the current effort is in the development and validation of an efficient parallel machine computational model, based on the FDNS code, to analyze the fluid dynamics of a wide variety of liquid jet configurations for general liquid rocket engine injection system applications. This model includes physical models for droplet atomization, breakup/coalescence, evaporation, turbulence mixing and gas-phase combustion. Benchmark validation cases for liquid rocket engine chamber combustion conditions will be performed for model validation purpose. Test cases may include shear coaxial, swirl coaxial and impinging injection systems with combinations LOXIH2 or LOXISP-1 propellant injector elements used in rocket engine designs. As a final goal of this project, a well tested parallel CFD performance methodology together with a user's operation description in a final technical report will be reported at the end of the proposed research effort.
Molecular Sticker Model Stimulation on Silicon for a Maximum Clique Problem
Ning, Jianguo; Li, Yanmei; Yu, Wen
2015-01-01
Molecular computers (also called DNA computers), as an alternative to traditional electronic computers, are smaller in size but more energy efficient, and have massive parallel processing capacity. However, DNA computers may not outperform electronic computers owing to their higher error rates and some limitations of the biological laboratory. The stickers model, as a typical DNA-based computer, is computationally complete and universal, and can be viewed as a bit-vertically operating machine. This makes it attractive for silicon implementation. Inspired by the information processing method on the stickers computer, we propose a novel parallel computing model called DEM (DNA Electronic Computing Model) on System-on-a-Programmable-Chip (SOPC) architecture. Except for the significant difference in the computing medium—transistor chips rather than bio-molecules—the DEM works similarly to DNA computers in immense parallel information processing. Additionally, a plasma display panel (PDP) is used to show the change of solutions, and helps us directly see the distribution of assignments. The feasibility of the DEM is tested by applying it to compute a maximum clique problem (MCP) with eight vertices. Owing to the limited computing sources on SOPC architecture, the DEM could solve moderate-size problems in polynomial time. PMID:26075867
2014-09-18
methods of flight plan optimization, and yielded such techniques as: parallel A* (Gudaitis, 1994), Multi-Objective Traveling Salesman algorithms...1 Problem Statement...currently their utilization comes with a price: Problem Statement “Today’s unmanned systems require significant human interaction to operate. As
NASA Astrophysics Data System (ADS)
Schmitz, Oliver; de Jong, Kor; Karssenberg, Derek
2017-04-01
There is an increasing demand to run environmental models on a big scale: simulations over large areas at high resolution. The heterogeneity of available computing hardware such as multi-core CPUs, GPUs or supercomputer potentially provides significant computing power to fulfil this demand. However, this requires detailed knowledge of the underlying hardware, parallel algorithm design and the implementation thereof in an efficient system programming language. Domain scientists such as hydrologists or ecologists often lack this specific software engineering knowledge, their emphasis is (and should be) on exploratory building and analysis of simulation models. As a result, models constructed by domain specialists mostly do not take full advantage of the available hardware. A promising solution is to separate the model building activity from software engineering by offering domain specialists a model building framework with pre-programmed building blocks that they combine to construct a model. The model building framework, consequently, needs to have built-in capabilities to make full usage of the available hardware. Developing such a framework providing understandable code for domain scientists and being runtime efficient at the same time poses several challenges on developers of such a framework. For example, optimisations can be performed on individual operations or the whole model, or tasks need to be generated for a well-balanced execution without explicitly knowing the complexity of the domain problem provided by the modeller. Ideally, a modelling framework supports the optimal use of available hardware whichsoever combination of model building blocks scientists use. We demonstrate our ongoing work on developing parallel algorithms for spatio-temporal modelling and demonstrate 1) PCRaster, an environmental software framework (http://www.pcraster.eu) providing spatio-temporal model building blocks and 2) parallelisation of about 50 of these building blocks using the new Fern library (https://github.com/geoneric/fern/), an independent generic raster processing library. Fern is a highly generic software library and its algorithms can be configured according to the configuration of a modelling framework. With manageable programming effort (e.g. matching data types between programming and domain language) we created a binding between Fern and PCRaster. The resulting PCRaster Python multicore module can be used to execute existing PCRaster models without having to make any changes to the model code. We show initial results on synthetic and geoscientific models indicating significant runtime improvements provided by parallel local and focal operations. We further outline challenges in improving remaining algorithms such as flow operations over digital elevation maps and further potential improvements like enhancing disk I/O.
Parallel image logical operations using cross correlation
NASA Technical Reports Server (NTRS)
Strong, J. P., III
1972-01-01
Methods are presented for counting areas in an image in a parallel manner using noncoherent optical techniques. The techniques presented include the Levialdi algorithm for counting, optical techniques for binary operations, and cross-correlation.
NASA Astrophysics Data System (ADS)
Bolis, A.; Cantwell, C. D.; Moxey, D.; Serson, D.; Sherwin, S. J.
2016-09-01
A hybrid parallelisation technique for distributed memory systems is investigated for a coupled Fourier-spectral/hp element discretisation of domains characterised by geometric homogeneity in one or more directions. The performance of the approach is mathematically modelled in terms of operation count and communication costs for identifying the most efficient parameter choices. The model is calibrated to target a specific hardware platform after which it is shown to accurately predict the performance in the hybrid regime. The method is applied to modelling turbulent flow using the incompressible Navier-Stokes equations in an axisymmetric pipe and square channel. The hybrid method extends the practical limitations of the discretisation, allowing greater parallelism and reduced wall times. Performance is shown to continue to scale when both parallelisation strategies are used.
Tuning iteration space slicing based tiled multi-core code implementing Nussinov's RNA folding.
Palkowski, Marek; Bielecki, Wlodzimierz
2018-01-15
RNA folding is an ongoing compute-intensive task of bioinformatics. Parallelization and improving code locality for this kind of algorithms is one of the most relevant areas in computational biology. Fortunately, RNA secondary structure approaches, such as Nussinov's recurrence, involve mathematical operations over affine control loops whose iteration space can be represented by the polyhedral model. This allows us to apply powerful polyhedral compilation techniques based on the transitive closure of dependence graphs to generate parallel tiled code implementing Nussinov's RNA folding. Such techniques are within the iteration space slicing framework - the transitive dependences are applied to the statement instances of interest to produce valid tiles. The main problem at generating parallel tiled code is defining a proper tile size and tile dimension which impact parallelism degree and code locality. To choose the best tile size and tile dimension, we first construct parallel parametric tiled code (parameters are variables defining tile size). With this purpose, we first generate two nonparametric tiled codes with different fixed tile sizes but with the same code structure and then derive a general affine model, which describes all integer factors available in expressions of those codes. Using this model and known integer factors present in the mentioned expressions (they define the left-hand side of the model), we find unknown integers in this model for each integer factor available in the same fixed tiled code position and replace in this code expressions, including integer factors, with those including parameters. Then we use this parallel parametric tiled code to implement the well-known tile size selection (TSS) technique, which allows us to discover in a given search space the best tile size and tile dimension maximizing target code performance. For a given search space, the presented approach allows us to choose the best tile size and tile dimension in parallel tiled code implementing Nussinov's RNA folding. Experimental results, received on modern Intel multi-core processors, demonstrate that this code outperforms known closely related implementations when the length of RNA strands is bigger than 2500.
Large-Scale, Parallel, Multi-Sensor Data Fusion in the Cloud
NASA Astrophysics Data System (ADS)
Wilson, B. D.; Manipon, G.; Hua, H.
2012-12-01
NASA's Earth Observing System (EOS) is an ambitious facility for studying global climate change. The mandate now is to combine measurements from the instruments on the "A-Train" platforms (AIRS, AMSR-E, MODIS, MISR, MLS, and CloudSat) and other Earth probes to enable large-scale studies of climate change over periods of years to decades. However, moving from predominantly single-instrument studies to a multi-sensor, measurement-based model for long-duration analysis of important climate variables presents serious challenges for large-scale data mining and data fusion. For example, one might want to compare temperature and water vapor retrievals from one instrument (AIRS) to another instrument (MODIS), and to a model (ECMWF), stratify the comparisons using a classification of the "cloud scenes" from CloudSat, and repeat the entire analysis over years of AIRS data. To perform such an analysis, one must discover & access multiple datasets from remote sites, find the space/time "matchups" between instruments swaths and model grids, understand the quality flags and uncertainties for retrieved physical variables, assemble merged datasets, and compute fused products for further scientific and statistical analysis. To efficiently assemble such decade-scale datasets in a timely manner, we are utilizing Elastic Computing in the Cloud and parallel map/reduce-based algorithms. "SciReduce" is a Hadoop-like parallel analysis system, programmed in parallel python, that is designed from the ground up for Earth science. SciReduce executes inside VMWare images and scales to any number of nodes in the Cloud. Unlike Hadoop, in which simple tuples (keys & values) are passed between the map and reduce functions, SciReduce operates on bundles of named numeric arrays, which can be passed in memory or serialized to disk in netCDF4 or HDF5. Thus, SciReduce uses the native datatypes (geolocated grids, swaths, and points) that geo-scientists are familiar with. We are deploying within SciReduce a versatile set of python operators for data lookup, access, subsetting, co-registration, mining, fusion, and statistical analysis. All operators take in sets of geo-located arrays and generate more arrays. Large, multi-year satellite and model datasets are automatically "sharded" by time and space across a cluster of nodes so that years of data (millions of granules) can be compared or fused in a massively parallel way. Input variables (arrays) are pulled on-demand into the Cloud using OPeNDAP or webification URLs, thereby minimizing the size of the stored input and intermediate datasets. A typical map function might assemble and quality control AIRS Level-2 water vapor profiles for a year of data in parallel, then a reduce function would average the profiles in lat/lon bins (again, in parallel), and a final reduce would aggregate the climatology and write it to output files. We are using SciReduce to automate the production of multiple versions of a multi-year water vapor climatology (AIRS & MODIS), stratified by Cloudsat cloud classification, and compare it to models (ECMWF & MERRA reanalysis). We will present the architecture of SciReduce, describe the achieved "clock time" speedups in fusing huge datasets on our own nodes and in the Amazon Cloud, and discuss the Cloud cost tradeoffs for storage, compute, and data transfer.
Large-Scale, Parallel, Multi-Sensor Data Fusion in the Cloud
NASA Astrophysics Data System (ADS)
Wilson, B.; Manipon, G.; Hua, H.
2012-04-01
NASA's Earth Observing System (EOS) is an ambitious facility for studying global climate change. The mandate now is to combine measurements from the instruments on the "A-Train" platforms (AIRS, AMSR-E, MODIS, MISR, MLS, and CloudSat) and other Earth probes to enable large-scale studies of climate change over periods of years to decades. However, moving from predominantly single-instrument studies to a multi-sensor, measurement-based model for long-duration analysis of important climate variables presents serious challenges for large-scale data mining and data fusion. For example, one might want to compare temperature and water vapor retrievals from one instrument (AIRS) to another instrument (MODIS), and to a model (ECMWF), stratify the comparisons using a classification of the "cloud scenes" from CloudSat, and repeat the entire analysis over years of AIRS data. To perform such an analysis, one must discover & access multiple datasets from remote sites, find the space/time "matchups" between instruments swaths and model grids, understand the quality flags and uncertainties for retrieved physical variables, assemble merged datasets, and compute fused products for further scientific and statistical analysis. To efficiently assemble such decade-scale datasets in a timely manner, we are utilizing Elastic Computing in the Cloud and parallel map/reduce-based algorithms. "SciReduce" is a Hadoop-like parallel analysis system, programmed in parallel python, that is designed from the ground up for Earth science. SciReduce executes inside VMWare images and scales to any number of nodes in the Cloud. Unlike Hadoop, in which simple tuples (keys & values) are passed between the map and reduce functions, SciReduce operates on bundles of named numeric arrays, which can be passed in memory or serialized to disk in netCDF4 or HDF5. Thus, SciReduce uses the native datatypes (geolocated grids, swaths, and points) that geo-scientists are familiar with. We are deploying within SciReduce a versatile set of python operators for data lookup, access, subsetting, co-registration, mining, fusion, and statistical analysis. All operators take in sets of geo-arrays and generate more arrays. Large, multi-year satellite and model datasets are automatically "sharded" by time and space across a cluster of nodes so that years of data (millions of granules) can be compared or fused in a massively parallel way. Input variables (arrays) are pulled on-demand into the Cloud using OPeNDAP or webification URLs, thereby minimizing the size of the stored input and intermediate datasets. A typical map function might assemble and quality control AIRS Level-2 water vapor profiles for a year of data in parallel, then a reduce function would average the profiles in bins (again, in parallel), and a final reduce would aggregate the climatology and write it to output files. We are using SciReduce to automate the production of multiple versions of a multi-year water vapor climatology (AIRS & MODIS), stratified by Cloudsat cloud classification, and compare it to models (ECMWF & MERRA reanalysis). We will present the architecture of SciReduce, describe the achieved "clock time" speedups in fusing huge datasets on our own nodes and in the Amazon Cloud, and discuss the Cloud cost tradeoffs for storage, compute, and data transfer.
Invariant operators, orthogonal bases and correlators in general tensor models
NASA Astrophysics Data System (ADS)
Diaz, Pablo; Rey, Soo-Jong
2018-07-01
We study invariant operators in general tensor models. We show that representation theory provides an efficient framework to count and classify invariants in tensor models of (gauge) symmetry Gd = U (N1) ⊗ ⋯ ⊗ U (Nd). As a continuation and completion of our earlier work, we present two natural ways of counting invariants, one for arbitrary Gd and another valid for large rank of Gd. We construct bases of invariant operators based on the counting, and compute correlators of their elements. The basis associated with finite rank of Gd diagonalizes the two-point function of the free theory. It is analogous to the restricted Schur basis used in matrix models. We show that the constructions get almost identical as we swap the Littlewood-Richardson numbers in multi-matrix models with Kronecker coefficients in general tensor models. We explore the parallelism between matrix model and tensor model in depth from the perspective of representation theory and comment on several ideas for future investigation.
NASA Technical Reports Server (NTRS)
Horvath, Joan C.; Alkalaj, Leon J.; Schneider, Karl M.; Amador, Arthur V.; Spitale, Joseph N.
1993-01-01
Robotic spacecraft are controlled by sets of commands called 'sequences.' These sequences must be checked against mission constraints. Making our existing constraint checking program faster would enable new capabilities in our uplink process. Therefore, we are rewriting this program to run on a parallel computer. To do so, we had to determine how to run constraint-checking algorithms in parallel and create a new method of specifying spacecraft models and constraints. This new specification gives us a means of representing flight systems and their predicted response to commands which could be used in a variety of applications throughout the command process, particularly during anomaly or high-activity operations. This commonality could reduce operations cost and risk for future complex missions. Lessons learned in applying some parts of this system to the TOPEX/Poseidon mission will be described.
Bellucci, Michael A; Coker, David F
2011-07-28
We describe a new method for constructing empirical valence bond potential energy surfaces using a parallel multilevel genetic program (PMLGP). Genetic programs can be used to perform an efficient search through function space and parameter space to find the best functions and sets of parameters that fit energies obtained by ab initio electronic structure calculations. Building on the traditional genetic program approach, the PMLGP utilizes a hierarchy of genetic programming on two different levels. The lower level genetic programs are used to optimize coevolving populations in parallel while the higher level genetic program (HLGP) is used to optimize the genetic operator probabilities of the lower level genetic programs. The HLGP allows the algorithm to dynamically learn the mutation or combination of mutations that most effectively increase the fitness of the populations, causing a significant increase in the algorithm's accuracy and efficiency. The algorithm's accuracy and efficiency is tested against a standard parallel genetic program with a variety of one-dimensional test cases. Subsequently, the PMLGP is utilized to obtain an accurate empirical valence bond model for proton transfer in 3-hydroxy-gamma-pyrone in gas phase and protic solvent. © 2011 American Institute of Physics
Implementation of a Parallel Kalman Filter for Stratospheric Chemical Tracer Assimilation
NASA Technical Reports Server (NTRS)
Chang, Lang-Ping; Lyster, Peter M.; Menard, R.; Cohn, S. E.
1998-01-01
A Kalman filter for the assimilation of long-lived atmospheric chemical constituents has been developed for two-dimensional transport models on isentropic surfaces over the globe. An important attribute of the Kalman filter is that it calculates error covariances of the constituent fields using the tracer dynamics. Consequently, the current Kalman-filter assimilation is a five-dimensional problem (coordinates of two points and time), and it can only be handled on computers with large memory and high floating point speed. In this paper, an implementation of the Kalman filter for distributed-memory, message-passing parallel computers is discussed. Two approaches were studied: an operator decomposition and a covariance decomposition. The latter was found to be more scalable than the former, and it possesses the property that the dynamical model does not need to be parallelized, which is of considerable practical advantage. This code is currently used to assimilate constituent data retrieved by limb sounders on the Upper Atmosphere Research Satellite. Tests of the code examined the variance transport and observability properties. Aspects of the parallel implementation, some timing results, and a brief discussion of the physical results will be presented.
Mechanism to support generic collective communication across a variety of programming models
Almasi, Gheorghe [Ardsley, NY; Dozsa, Gabor [Ardsley, NY; Kumar, Sameer [White Plains, NY
2011-07-19
A system and method for supporting collective communications on a plurality of processors that use different parallel programming paradigms, in one aspect, may comprise a schedule defining one or more tasks in a collective operation, an executor that executes the task, a multisend module to perform one or more data transfer functions associated with the tasks, and a connection manager that controls one or more connections and identifies an available connection. The multisend module uses the available connection in performing the one or more data transfer functions. A plurality of processors that use different parallel programming paradigms can use a common implementation of the schedule module, the executor module, the connection manager and the multisend module via a language adaptor specific to a parallel programming paradigm implemented on a processor.
1987-12-01
Application Programs Intelligent Disk Database Controller Manangement System Operating System Host .1’ I% Figure 2. Intelligent Disk Controller Application...8217. /- - • Database Control -% Manangement System Disk Data Controller Application Programs Operating Host I"" Figure 5. Processor-Per- Head data. Therefore, the...However. these ad- ditional properties have been proven in classical set and relation theory [75]. These additional properties are described here
Using Apex To Construct CPM-GOMS Models
NASA Technical Reports Server (NTRS)
John, Bonnie; Vera, Alonso; Matessa, Michael; Freed, Michael; Remington, Roger
2006-01-01
process for automatically generating computational models of human/computer interactions as well as graphical and textual representations of the models has been built on the conceptual foundation of a method known in the art as CPM-GOMS. This method is so named because it combines (1) the task decomposition of analysis according to an underlying method known in the art as the goals, operators, methods, and selection (GOMS) method with (2) a model of human resource usage at the level of cognitive, perceptual, and motor (CPM) operations. CPM-GOMS models have made accurate predictions about behaviors of skilled computer users in routine tasks, but heretofore, such models have been generated in a tedious, error-prone manual process. In the present process, CPM-GOMS models are generated automatically from a hierarchical task decomposition expressed by use of a computer program, known as Apex, designed previously to be used to model human behavior in complex, dynamic tasks. An inherent capability of Apex for scheduling of resources automates the difficult task of interleaving the cognitive, perceptual, and motor resources that underlie common task operators (e.g., move and click mouse). The user interface of Apex automatically generates Program Evaluation Review Technique (PERT) charts, which enable modelers to visualize the complex parallel behavior represented by a model. Because interleaving and the generation of displays to aid visualization are automated, it is now feasible to construct arbitrarily long sequences of behaviors. The process was tested by using Apex to create a CPM-GOMS model of a relatively simple human/computer-interaction task and comparing the time predictions of the model and measurements of the times taken by human users in performing the various steps of the task. The task was to withdraw $80 in cash from an automated teller machine (ATM). For the test, a Visual Basic mockup of an ATM was created, with a provision for input from (and measurement of the performance of) the user via a mouse. The times predicted by the automatically generated model turned out to approximate the measured times fairly well (see figure). While these results are promising, there is need for further development of the process. Moreover, it will also be necessary to test other, more complex models: The actions required of the user in the ATM task are too sequential to involve substantial parallelism and interleaving and, hence, do not serve as an adequate test of the unique strength of CPM-GOMS models to accommodate parallelism and interleaving.
Effecting a broadcast with an allreduce operation on a parallel computer
Almasi, Gheorghe; Archer, Charles J.; Ratterman, Joseph D.; Smith, Brian E.
2010-11-02
A parallel computer comprises a plurality of compute nodes organized into at least one operational group for collective parallel operations. Each compute node is assigned a unique rank and is coupled for data communications through a global combining network. One compute node is assigned to be a logical root. A send buffer and a receive buffer is configured. Each element of a contribution of the logical root in the send buffer is contributed. One or more zeros corresponding to a size of the element are injected. An allreduce operation with a bitwise OR using the element and the injected zeros is performed. And the result for the allreduce operation is determined and stored in each receive buffer.
NASA Technical Reports Server (NTRS)
Waller, Marvin C.; Scanlon, Charles H.
1999-01-01
A number of our nations airports depend on closely spaced parallel runway operations to handle their normal traffic throughput when weather conditions are favorable. For safety these operations are curtailed in Instrument Meteorological Conditions (IMC) when the ceiling or visibility deteriorates and operations in many cases are limited to the equivalent of a single runway. Where parallel runway spacing is less than 2500 feet, capacity loss in IMC is on the order of 50 percent for these runways. Clearly, these capacity losses result in landing delays, inconveniences to the public, increased operational cost to the airlines, and general interruption of commerce. This document presents a description and the results of a fixed-base simulation study to evaluate an initial concept that includes a set of procedures for conducting safe flight in closely spaced parallel runway operations in IMC. Consideration of flight-deck information technology and displays to support the procedures is also included in the discussions. The procedures and supporting technology rely heavily on airborne capabilities operating in conjunction with the air traffic control system.
An abstract model for radiative transfer in an atmosphere with reflection by the planetary surface
NASA Astrophysics Data System (ADS)
Greenberg, W.; van der Mee, C. V. M.
1985-07-01
A Hilbert-space model is developed that applies to radiative transfer in a homogeneous, plane-parallel planetary atmosphere. Reflection and absorption by the planetary surface are taken into account by imposing a reflective boundary condition. The existence and uniqueness of the solution of this boundary value problem are established by proving the invertibility of a scattering operator using the Fredholm alternative.
Expressing Parallelism with ROOT
NASA Astrophysics Data System (ADS)
Piparo, D.; Tejedor, E.; Guiraud, E.; Ganis, G.; Mato, P.; Moneta, L.; Valls Pla, X.; Canal, P.
2017-10-01
The need for processing the ever-increasing amount of data generated by the LHC experiments in a more efficient way has motivated ROOT to further develop its support for parallelism. Such support is being tackled both for shared-memory and distributed-memory environments. The incarnations of the aforementioned parallelism are multi-threading, multi-processing and cluster-wide executions. In the area of multi-threading, we discuss the new implicit parallelism and related interfaces, as well as the new building blocks to safely operate with ROOT objects in a multi-threaded environment. Regarding multi-processing, we review the new MultiProc framework, comparing it with similar tools (e.g. multiprocessing module in Python). Finally, as an alternative to PROOF for cluster-wide executions, we introduce the efforts on integrating ROOT with state-of-the-art distributed data processing technologies like Spark, both in terms of programming model and runtime design (with EOS as one of the main components). For all the levels of parallelism, we discuss, based on real-life examples and measurements, how our proposals can increase the productivity of scientists.
Expressing Parallelism with ROOT
DOE Office of Scientific and Technical Information (OSTI.GOV)
Piparo, D.; Tejedor, E.; Guiraud, E.
The need for processing the ever-increasing amount of data generated by the LHC experiments in a more efficient way has motivated ROOT to further develop its support for parallelism. Such support is being tackled both for shared-memory and distributed-memory environments. The incarnations of the aforementioned parallelism are multi-threading, multi-processing and cluster-wide executions. In the area of multi-threading, we discuss the new implicit parallelism and related interfaces, as well as the new building blocks to safely operate with ROOT objects in a multi-threaded environment. Regarding multi-processing, we review the new MultiProc framework, comparing it with similar tools (e.g. multiprocessing module inmore » Python). Finally, as an alternative to PROOF for cluster-wide executions, we introduce the efforts on integrating ROOT with state-of-the-art distributed data processing technologies like Spark, both in terms of programming model and runtime design (with EOS as one of the main components). For all the levels of parallelism, we discuss, based on real-life examples and measurements, how our proposals can increase the productivity of scientists.« less
NASA Astrophysics Data System (ADS)
Sun, Dongye; Lin, Xinyou; Qin, Datong; Deng, Tao
2012-11-01
Energy management(EM) is a core technique of hybrid electric bus(HEB) in order to advance fuel economy performance optimization and is unique for the corresponding configuration. There are existing algorithms of control strategy seldom take battery power management into account with international combustion engine power management. In this paper, a type of power-balancing instantaneous optimization(PBIO) energy management control strategy is proposed for a novel series-parallel hybrid electric bus. According to the characteristic of the novel series-parallel architecture, the switching boundary condition between series and parallel mode as well as the control rules of the power-balancing strategy are developed. The equivalent fuel model of battery is implemented and combined with the fuel of engine to constitute the objective function which is to minimize the fuel consumption at each sampled time and to coordinate the power distribution in real-time between the engine and battery. To validate the proposed strategy effective and reasonable, a forward model is built based on Matlab/Simulink for the simulation and the dSPACE autobox is applied to act as a controller for hardware in-the-loop integrated with bench test. Both the results of simulation and hardware-in-the-loop demonstrate that the proposed strategy not only enable to sustain the battery SOC within its operational range and keep the engine operation point locating the peak efficiency region, but also the fuel economy of series-parallel hybrid electric bus(SPHEB) dramatically advanced up to 30.73% via comparing with the prototype bus and a similar improvement for PBIO strategy relative to rule-based strategy, the reduction of fuel consumption is up to 12.38%. The proposed research ensures the algorithm of PBIO is real-time applicability, improves the efficiency of SPHEB system, as well as suite to complicated configuration perfectly.
NASA Astrophysics Data System (ADS)
Larour, Eric; Utke, Jean; Bovin, Anton; Morlighem, Mathieu; Perez, Gilberto
2016-11-01
Within the framework of sea-level rise projections, there is a strong need for hindcast validation of the evolution of polar ice sheets in a way that tightly matches observational records (from radar, gravity, and altimetry observations mainly). However, the computational requirements for making hindcast reconstructions possible are severe and rely mainly on the evaluation of the adjoint state of transient ice-flow models. Here, we look at the computation of adjoints in the context of the NASA/JPL/UCI Ice Sheet System Model (ISSM), written in C++ and designed for parallel execution with MPI. We present the adaptations required in the way the software is designed and written, but also generic adaptations in the tools facilitating the adjoint computations. We concentrate on the use of operator overloading coupled with the AdjoinableMPI library to achieve the adjoint computation of the ISSM. We present a comprehensive approach to (1) carry out type changing through the ISSM, hence facilitating operator overloading, (2) bind to external solvers such as MUMPS and GSL-LU, and (3) handle MPI-based parallelism to scale the capability. We demonstrate the success of the approach by computing sensitivities of hindcast metrics such as the misfit to observed records of surface altimetry on the northeastern Greenland Ice Stream, or the misfit to observed records of surface velocities on Upernavik Glacier, central West Greenland. We also provide metrics for the scalability of the approach, and the expected performance. This approach has the potential to enable a new generation of hindcast-validated projections that make full use of the wealth of datasets currently being collected, or already collected, in Greenland and Antarctica.
NASA Astrophysics Data System (ADS)
Perez, G. L.; Larour, E. Y.; Morlighem, M.
2016-12-01
Within the framework of sea-level rise projections, there is a strong need for hindcast validation of the evolution of polar ice sheets in a way that tightly matches observational records (from radar and altimetry observations mainly). However, the computational requirements for making hindcast reconstructions possible are severe and rely mainly on the evaluation of the adjoint state of transient ice-flow models. Here, we look at the computation of adjoints in the context of the NASA/JPL/UCI Ice Sheet System Model, written in C++ and designed for parallel execution with MPI. We present the adaptations required in the way the software is designed and written but also generic adaptations in the tools facilitating the adjoint computations. We concentrate on the use of operator overloading coupled with the AdjoinableMPI library to achieve the adjoint computation of ISSM. We present a comprehensive approach to 1) carry out type changing through ISSM, hence facilitating operator overloading, 2) bind to external solvers such as MUMPS and GSL-LU and 3) handle MPI-based parallelism to scale the capability. We demonstrate the success of the approach by computing sensitivities of hindcast metrics such as the misfit to observed records of surface altimetry on the North-East Greenland Ice Stream, or the misfit to observed records of surface velocities on Upernavik Glacier, Central West Greenland. We also provide metrics for the scalability of the approach, and the expected performance. This approach has the potential of enabling a new generation of hindcast-validated projections that make full use of the wealth of datasets currently being collected, or alreay collected in Greenland and Antarctica, such as surface altimetry, surface velocities, and/or gravity measurements.
Numerical Prediction of CCV in a PFI Engine using a Parallel LES Approach
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ameen, Muhsin M; Mirzaeian, Mohsen; Millo, Federico
Cycle-to-cycle variability (CCV) is detrimental to IC engine operation and can lead to partial burn, misfire, and knock. Predicting CCV numerically is extremely challenging due to two key reasons. Firstly, high-fidelity methods such as large eddy simulation (LES) are required to accurately resolve the incylinder turbulent flowfield both spatially and temporally. Secondly, CCV is experienced over long timescales and hence the simulations need to be performed for hundreds of consecutive cycles. Ameen et al. (Int. J. Eng. Res., 2017) developed a parallel perturbation model (PPM) approach to dissociate this long time-scale problem into several shorter timescale problems. The strategy ismore » to perform multiple single-cycle simulations in parallel by effectively perturbing the initial velocity field based on the intensity of the in-cylinder turbulence. This strategy was demonstrated for motored engine and it was shown that the mean and variance of the in-cylinder flowfield was captured reasonably well by this approach. In the present study, this PPM approach is extended to simulate the CCV in a fired port-fuel injected (PFI) SI engine. Two operating conditions are considered – a medium CCV operating case corresponding to 2500 rpm and 16 bar BMEP and a low CCV case corresponding to 4000 rpm and 12 bar BMEP. The predictions from this approach are also shown to be similar to the consecutive LES cycles. Both the consecutive and PPM LES cycles are observed to under-predict the variability in the early stage of combustion. The parallel approach slightly underpredicts the cyclic variability at all stages of combustion as compared to the consecutive LES cycles. However, it is shown that the parallel approach is able to predict the coefficient of variation (COV) of the in-cylinder pressure and burn rate related parameters with sufficient accuracy, and is also able to predict the qualitative trends in CCV with changing operating conditions. The convergence of the statistics predicted by the PPM approach with respect to the number of consecutive cycles required for each parallel simulation is also investigated. It is shown that this new approach is able to give accurate predictions of the CCV in fired engines in less than one-tenth of the time required for the conventional approach of simulating consecutive engine cycles.« less
Everyone Is on Welfare: "The Role of Redistribution in Social Policy" Revisited.
ERIC Educational Resources Information Center
Abramovitz, Mimi
1983-01-01
Uses Titmuss's model of three welfare systems to analyze how the rapidly expanding system of tax benefits available to middle- and upper-income groups parallels the social welfare system for the poor. Operating as a shadow welfare state, the tax benefits system contributes to the upward redistribution of income. (Author/JAC)
The electrical characteristics of the dielectric barrier discharges
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yehia, Ashraf, E-mail: yehia30161@yahoo.com; Department of Physics, Faculty of Science, Assiut University, Assiut 71516
2016-06-15
The electrical characteristics of the dielectric barrier discharges have been studied in this paper under different operating conditions. The dielectric barrier discharges were formed inside two reactors composed of electrodes in the shape of two parallel plates. The dielectric layers inside these reactors were pasted on the surface of one electrode only in the first reactor and on the surfaces of the two electrodes in the second reactor. The reactor under study has been fed by atmospheric air that flowed inside it with a constant rate at the normal temperature and pressure, in parallel with applying a sinusoidal ac voltagemore » between the electrodes of the reactor. The amount of the electric charge that flows from the reactors to the external circuit has been studied experimentally versus the ac peak voltage applied to them. An analytical model has been obtained for calculating the electrical characteristics of the dielectric barrier discharges that were formed inside the reactors during a complete cycle of the ac voltage. The results that were calculated by using this model have agreed well with the experimental results under the different operating conditions.« less
Performing an allreduce operation on a plurality of compute nodes of a parallel computer
Faraj, Ahmad [Rochester, MN
2012-04-17
Methods, apparatus, and products are disclosed for performing an allreduce operation on a plurality of compute nodes of a parallel computer. Each compute node includes at least two processing cores. Each processing core has contribution data for the allreduce operation. Performing an allreduce operation on a plurality of compute nodes of a parallel computer includes: establishing one or more logical rings among the compute nodes, each logical ring including at least one processing core from each compute node; performing, for each logical ring, a global allreduce operation using the contribution data for the processing cores included in that logical ring, yielding a global allreduce result for each processing core included in that logical ring; and performing, for each compute node, a local allreduce operation using the global allreduce results for each processing core on that compute node.
Effective Vectorization with OpenMP 4.5
DOE Office of Scientific and Technical Information (OSTI.GOV)
Huber, Joseph N.; Hernandez, Oscar R.; Lopez, Matthew Graham
This paper describes how the Single Instruction Multiple Data (SIMD) model and its extensions in OpenMP work, and how these are implemented in different compilers. Modern processors are highly parallel computational machines which often include multiple processors capable of executing several instructions in parallel. Understanding SIMD and executing instructions in parallel allows the processor to achieve higher performance without increasing the power required to run it. SIMD instructions can significantly reduce the runtime of code by executing a single operation on large groups of data. The SIMD model is so integral to the processor s potential performance that, if SIMDmore » is not utilized, less than half of the processor is ever actually used. Unfortunately, using SIMD instructions is a challenge in higher level languages because most programming languages do not have a way to describe them. Most compilers are capable of vectorizing code by using the SIMD instructions, but there are many code features important for SIMD vectorization that the compiler cannot determine at compile time. OpenMP attempts to solve this by extending the C++/C and Fortran programming languages with compiler directives that express SIMD parallelism. OpenMP is used to pass hints to the compiler about the code to be executed in SIMD. This is a key resource for making optimized code, but it does not change whether or not the code can use SIMD operations. However, in many cases critical functions are limited by a poor understanding of how SIMD instructions are actually implemented, as SIMD can be implemented through vector instructions or simultaneous multi-threading (SMT). We have found that it is often the case that code cannot be vectorized, or is vectorized poorly, because the programmer does not have sufficient knowledge of how SIMD instructions work.« less
Constructing Neuronal Network Models in Massively Parallel Environments.
Ippen, Tammo; Eppler, Jochen M; Plesser, Hans E; Diesmann, Markus
2017-01-01
Recent advances in the development of data structures to represent spiking neuron network models enable us to exploit the complete memory of petascale computers for a single brain-scale network simulation. In this work, we investigate how well we can exploit the computing power of such supercomputers for the creation of neuronal networks. Using an established benchmark, we divide the runtime of simulation code into the phase of network construction and the phase during which the dynamical state is advanced in time. We find that on multi-core compute nodes network creation scales well with process-parallel code but exhibits a prohibitively large memory consumption. Thread-parallel network creation, in contrast, exhibits speedup only up to a small number of threads but has little overhead in terms of memory. We further observe that the algorithms creating instances of model neurons and their connections scale well for networks of ten thousand neurons, but do not show the same speedup for networks of millions of neurons. Our work uncovers that the lack of scaling of thread-parallel network creation is due to inadequate memory allocation strategies and demonstrates that thread-optimized memory allocators recover excellent scaling. An analysis of the loop order used for network construction reveals that more complex tests on the locality of operations significantly improve scaling and reduce runtime by allowing construction algorithms to step through large networks more efficiently than in existing code. The combination of these techniques increases performance by an order of magnitude and harnesses the increasingly parallel compute power of the compute nodes in high-performance clusters and supercomputers.
Constructing Neuronal Network Models in Massively Parallel Environments
Ippen, Tammo; Eppler, Jochen M.; Plesser, Hans E.; Diesmann, Markus
2017-01-01
Recent advances in the development of data structures to represent spiking neuron network models enable us to exploit the complete memory of petascale computers for a single brain-scale network simulation. In this work, we investigate how well we can exploit the computing power of such supercomputers for the creation of neuronal networks. Using an established benchmark, we divide the runtime of simulation code into the phase of network construction and the phase during which the dynamical state is advanced in time. We find that on multi-core compute nodes network creation scales well with process-parallel code but exhibits a prohibitively large memory consumption. Thread-parallel network creation, in contrast, exhibits speedup only up to a small number of threads but has little overhead in terms of memory. We further observe that the algorithms creating instances of model neurons and their connections scale well for networks of ten thousand neurons, but do not show the same speedup for networks of millions of neurons. Our work uncovers that the lack of scaling of thread-parallel network creation is due to inadequate memory allocation strategies and demonstrates that thread-optimized memory allocators recover excellent scaling. An analysis of the loop order used for network construction reveals that more complex tests on the locality of operations significantly improve scaling and reduce runtime by allowing construction algorithms to step through large networks more efficiently than in existing code. The combination of these techniques increases performance by an order of magnitude and harnesses the increasingly parallel compute power of the compute nodes in high-performance clusters and supercomputers. PMID:28559808
NASA Technical Reports Server (NTRS)
Springer, P.
1993-01-01
This paper discusses the method in which the Cascade-Correlation algorithm was parallelized in such a way that it could be run using the Time Warp Operating System (TWOS). TWOS is a special purpose operating system designed to run parellel discrete event simulations with maximum efficiency on parallel or distributed computers.
Tempest: Accelerated MS/MS database search software for heterogeneous computing platforms
Adamo, Mark E.; Gerber, Scott A.
2017-01-01
MS/MS database search algorithms derive a set of candidate peptide sequences from in-silico digest of a protein sequence database, and compute theoretical fragmentation patterns to match these candidates against observed MS/MS spectra. The original Tempest publication described these operations mapped to a CPU-GPU model, in which the CPU generates peptide candidates that are asynchronously sent to a discrete GPU to be scored against experimental spectra in parallel (Milloy et al., 2012). The current version of Tempest expands this model, incorporating OpenCL to offer seamless parallelization across multicore CPUs, GPUs, integrated graphics chips, and general-purpose coprocessors. Three protocols describe how to configure and run a Tempest search, including discussion of how to leverage Tempest's unique feature set to produce optimal results. PMID:27603022
Making almost commuting matrices commute
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hastings, Matthew B
Suppose two Hermitian matrices A, B almost commute ({parallel}[A,B]{parallel} {<=} {delta}). Are they close to a commuting pair of Hermitian matrices, A', B', with {parallel}A-A'{parallel},{parallel}B-B'{parallel} {<=} {epsilon}? A theorem of H. Lin shows that this is uniformly true, in that for every {epsilon} > 0 there exists a {delta} > 0, independent of the size N of the matrices, for which almost commuting implies being close to a commuting pair. However, this theorem does not specifiy how {delta} depends on {epsilon}. We give uniform bounds relating {delta} and {epsilon}. The proof is constructive, giving an explicit algorithm to construct A'more » and B'. We provide tighter bounds in the case of block tridiagonal and tridiagnonal matrices. Within the context of quantum measurement, this implies an algorithm to construct a basis in which we can make a projective measurement that approximately measures two approximately commuting operators simultaneously. Finally, we comment briefly on the case of approximately measuring three or more approximately commuting operators using POVMs (positive operator-valued measures) instead of projective measurements.« less
Enhancing GIS Capabilities for High Resolution Earth Science Grids
NASA Astrophysics Data System (ADS)
Koziol, B. W.; Oehmke, R.; Li, P.; O'Kuinghttons, R.; Theurich, G.; DeLuca, C.
2017-12-01
Applications for high performance GIS will continue to increase as Earth system models pursue more realistic representations of Earth system processes. Finer spatial resolution model input and output, unstructured or irregular modeling grids, data assimilation, and regional coordinate systems present novel challenges for GIS frameworks operating in the Earth system modeling domain. This presentation provides an overview of two GIS-driven applications that combine high performance software with big geospatial datasets to produce value-added tools for the modeling and geoscientific community. First, a large-scale interpolation experiment using National Hydrography Dataset (NHD) catchments, a high resolution rectilinear CONUS grid, and the Earth System Modeling Framework's (ESMF) conservative interpolation capability will be described. ESMF is a parallel, high-performance software toolkit that provides capabilities (e.g. interpolation) for building and coupling Earth science applications. ESMF is developed primarily by the NOAA Environmental Software Infrastructure and Interoperability (NESII) group. The purpose of this experiment was to test and demonstrate the utility of high performance scientific software in traditional GIS domains. Special attention will be paid to the nuanced requirements for dealing with high resolution, unstructured grids in scientific data formats. Second, a chunked interpolation application using ESMF and OpenClimateGIS (OCGIS) will demonstrate how spatial subsetting can virtually remove computing resource ceilings for very high spatial resolution interpolation operations. OCGIS is a NESII-developed Python software package designed for the geospatial manipulation of high-dimensional scientific datasets. An overview of the data processing workflow, why a chunked approach is required, and how the application could be adapted to meet operational requirements will be discussed here. In addition, we'll provide a general overview of OCGIS's parallel subsetting capabilities including challenges in the design and implementation of a scientific data subsetter.
Parallel CARLOS-3D code development
DOE Office of Scientific and Technical Information (OSTI.GOV)
Putnam, J.M.; Kotulski, J.D.
1996-02-01
CARLOS-3D is a three-dimensional scattering code which was developed under the sponsorship of the Electromagnetic Code Consortium, and is currently used by over 80 aerospace companies and government agencies. The code has been extensively validated and runs on both serial workstations and parallel super computers such as the Intel Paragon. CARLOS-3D is a three-dimensional surface integral equation scattering code based on a Galerkin method of moments formulation employing Rao- Wilton-Glisson roof-top basis for triangular faceted surfaces. Fully arbitrary 3D geometries composed of multiple conducting and homogeneous bulk dielectric materials can be modeled. This presentation describes some of the extensions tomore » the CARLOS-3D code, and how the operator structure of the code facilitated these improvements. Body of revolution (BOR) and two-dimensional geometries were incorporated by simply including new input routines, and the appropriate Galerkin matrix operator routines. Some additional modifications were required in the combined field integral equation matrix generation routine due to the symmetric nature of the BOR and 2D operators. Quadrilateral patched surfaces with linear roof-top basis functions were also implemented in the same manner. Quadrilateral facets and triangular facets can be used in combination to more efficiently model geometries with both large smooth surfaces and surfaces with fine detail such as gaps and cracks. Since the parallel implementation in CARLOS-3D is at high level, these changes were independent of the computer platform being used. This approach minimizes code maintenance, while providing capabilities with little additional effort. Results are presented showing the performance and accuracy of the code for some large scattering problems. Comparisons between triangular faceted and quadrilateral faceted geometry representations will be shown for some complex scatterers.« less
Temporal Precedence Checking for Switched Models and its Application to a Parallel Landing Protocol
NASA Technical Reports Server (NTRS)
Duggirala, Parasara Sridhar; Wang, Le; Mitra, Sayan; Viswanathan, Mahesh; Munoz, Cesar A.
2014-01-01
This paper presents an algorithm for checking temporal precedence properties of nonlinear switched systems. This class of properties subsume bounded safety and capture requirements about visiting a sequence of predicates within given time intervals. The algorithm handles nonlinear predicates that arise from dynamics-based predictions used in alerting protocols for state-of-the-art transportation systems. It is sound and complete for nonlinear switch systems that robustly satisfy the given property. The algorithm is implemented in the Compare Execute Check Engine (C2E2) using validated simulations. As a case study, a simplified model of an alerting system for closely spaced parallel runways is considered. The proposed approach is applied to this model to check safety properties of the alerting logic for different operating conditions such as initial velocities, bank angles, aircraft longitudinal separation, and runway separation.
NASA Technical Reports Server (NTRS)
Tesch, W. A.; Steenken, W. G.
1976-01-01
The results are presented of a one-dimensional dynamic digital blade row compressor model study of a J85-13 engine operating with uniform and with circumferentially distorted inlet flow. Details of the geometry and the derived blade row characteristics used to simulate the clean inlet performance are given. A stability criterion based upon the self developing unsteady internal flows near surge provided an accurate determination of the clean inlet surge line. The basic model was modified to include an arbitrary extent multi-sector parallel compressor configuration for investigating 180 deg 1/rev total pressure, total temperature, and combined total pressure and total temperature distortions. The combined distortions included opposed, coincident, and 90 deg overlapped patterns. The predicted losses in surge pressure ratio matched the measured data trends at all speeds and gave accurate predictions at high corrected speeds where the slope of the speed lines approached the vertical.
Rethinking key–value store for parallel I/O optimization
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kougkas, Anthony; Eslami, Hassan; Sun, Xian-He
2015-01-26
Key-value stores are being widely used as the storage system for large-scale internet services and cloud storage systems. However, they are rarely used in HPC systems, where parallel file systems are the dominant storage solution. In this study, we examine the architecture differences and performance characteristics of parallel file systems and key-value stores. We propose using key-value stores to optimize overall Input/Output (I/O) performance, especially for workloads that parallel file systems cannot handle well, such as the cases with intense data synchronization or heavy metadata operations. We conducted experiments with several synthetic benchmarks, an I/O benchmark, and a real application.more » We modeled the performance of these two systems using collected data from our experiments, and we provide a predictive method to identify which system offers better I/O performance given a specific workload. The results show that we can optimize the I/O performance in HPC systems by utilizing key-value stores.« less
Full Parallel Implementation of an All-Electron Four-Component Dirac-Kohn-Sham Program.
Rampino, Sergio; Belpassi, Leonardo; Tarantelli, Francesco; Storchi, Loriano
2014-09-09
A full distributed-memory implementation of the Dirac-Kohn-Sham (DKS) module of the program BERTHA (Belpassi et al., Phys. Chem. Chem. Phys. 2011, 13, 12368-12394) is presented, where the self-consistent field (SCF) procedure is replicated on all the parallel processes, each process working on subsets of the global matrices. The key feature of the implementation is an efficient procedure for switching between two matrix distribution schemes, one (integral-driven) optimal for the parallel computation of the matrix elements and another (block-cyclic) optimal for the parallel linear algebra operations. This approach, making both CPU-time and memory scalable with the number of processors used, virtually overcomes at once both time and memory barriers associated with DKS calculations. Performance, portability, and numerical stability of the code are illustrated on the basis of test calculations on three gold clusters of increasing size, an organometallic compound, and a perovskite model. The calculations are performed on a Beowulf and a BlueGene/Q system.
An object-oriented approach to nested data parallelism
NASA Technical Reports Server (NTRS)
Sheffler, Thomas J.; Chatterjee, Siddhartha
1994-01-01
This paper describes an implementation technique for integrating nested data parallelism into an object-oriented language. Data-parallel programming employs sets of data called 'collections' and expresses parallelism as operations performed over the elements of a collection. When the elements of a collection are also collections, then there is the possibility for 'nested data parallelism.' Few current programming languages support nested data parallelism however. In an object-oriented framework, a collection is a single object. Its type defines the parallel operations that may be applied to it. Our goal is to design and build an object-oriented data-parallel programming environment supporting nested data parallelism. Our initial approach is built upon three fundamental additions to C++. We add new parallel base types by implementing them as classes, and add a new parallel collection type called a 'vector' that is implemented as a template. Only one new language feature is introduced: the 'foreach' construct, which is the basis for exploiting elementwise parallelism over collections. The strength of the method lies in the compilation strategy, which translates nested data-parallel C++ into ordinary C++. Extracting the potential parallelism in nested 'foreach' constructs is called 'flattening' nested parallelism. We show how to flatten 'foreach' constructs using a simple program transformation. Our prototype system produces vector code which has been successfully run on workstations, a CM-2, and a CM-5.
UPC++ Programmer’s Guide (v1.0 2017.9)
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bachan, J.; Baden, S.; Bonachea, D.
UPC++ is a C++11 library that provides Asynchronous Partitioned Global Address Space (APGAS) programming. It is designed for writing parallel programs that run efficiently and scale well on distributed-memory parallel computers. The APGAS model is single program, multiple-data (SPMD), with each separate thread of execution (referred to as a rank, a term borrowed from MPI) having access to local memory as it would in C++. However, APGAS also provides access to a global address space, which is allocated in shared segments that are distributed over the ranks. UPC++ provides numerous methods for accessing and using global memory. In UPC++, allmore » operations that access remote memory are explicit, which encourages programmers to be aware of the cost of communication and data movement. Moreover, all remote-memory access operations are by default asynchronous, to enable programmers to write code that scales well even on hundreds of thousands of cores.« less
THERMAL DESIGN OF THE ITER VACUUM VESSEL COOLING SYSTEM
DOE Office of Scientific and Technical Information (OSTI.GOV)
Carbajo, Juan J; Yoder Jr, Graydon L; Kim, Seokho H
RELAP5-3D models of the ITER Vacuum Vessel (VV) Primary Heat Transfer System (PHTS) have been developed. The design of the cooling system is described in detail, and RELAP5 results are presented. Two parallel pump/heat exchanger trains comprise the design one train is for full-power operation and the other is for emergency operation or operation at decay heat levels. All the components are located inside the Tokamak building (a significant change from the original configurations). The results presented include operation at full power, decay heat operation, and baking operation. The RELAP5-3D results confirm that the design can operate satisfactorily during bothmore » normal pulsed power operation and decay heat operation. All the temperatures in the coolant and in the different system components are maintained within acceptable operating limits.« less
Parallel momentum input by tangential neutral beam injections in stellarator and heliotron plasmas
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nishimura, S., E-mail: nishimura.shin@lhd.nifs.ac.jp; Nakamura, Y.; Nishioka, K.
The configuration dependence of parallel momentum inputs to target plasma particle species by tangentially injected neutral beams is investigated in non-axisymmetric stellarator/heliotron model magnetic fields by assuming the existence of magnetic flux-surfaces. In parallel friction integrals of the full Rosenbluth-MacDonald-Judd collision operator in thermal particles' kinetic equations, numerically obtained eigenfunctions are used for excluding trapped fast ions that cannot contribute to the friction integrals. It is found that the momentum inputs to thermal ions strongly depend on magnetic field strength modulations on the flux-surfaces, while the input to electrons is insensitive to the modulation. In future plasma flow studies requiringmore » flow calculations of all particle species in more general non-symmetric toroidal configurations, the eigenfunction method investigated here will be useful.« less
Measuring effectiveness of a university by a parallel network DEA model
NASA Astrophysics Data System (ADS)
Kashim, Rosmaini; Kasim, Maznah Mat; Rahman, Rosshairy Abd
2017-11-01
Universities contribute significantly to the development of human capital and socio-economic improvement of a country. Due to that, Malaysian universities carried out various initiatives to improve their performance. Most studies have used the Data Envelopment Analysis (DEA) model to measure efficiency rather than effectiveness, even though, the measurement of effectiveness is important to realize how effective a university in achieving its ultimate goals. A university system has two major functions, namely teaching and research and every function has different resources based on its emphasis. Therefore, a university is actually structured as a parallel production system with its overall effectiveness is the aggregated effectiveness of teaching and research. Hence, this paper is proposing a parallel network DEA model to measure the effectiveness of a university. This model includes internal operations of both teaching and research functions into account in computing the effectiveness of a university system. In literature, the graduate and the number of program offered are defined as the outputs, then, the employed graduates and the numbers of programs accredited from professional bodies are considered as the outcomes for measuring the teaching effectiveness. Amount of grants is regarded as the output of research, while the different quality of publications considered as the outcomes of research. A system is considered effective if only all functions are effective. This model has been tested using a hypothetical set of data consisting of 14 faculties at a public university in Malaysia. The results show that none of the faculties is relatively effective for the overall performance. Three faculties are effective in teaching and two faculties are effective in research. The potential applications of the parallel network DEA model allow the top management of a university to identify weaknesses in any functions in their universities and take rational steps for improvement.
1987-11-01
The purpose of the workshop was to bring together people whose interests lie in the areas of operating I systems , programming languages, and formal... operating system support, and applications. There were parallel discussions on scheduling and distributed languages, and on real-time and operating ...number of key challenges: * Distributed systems , languages, environments - Make transactions efficient. Integrate them into the operating system
Soto-Quiros, Pablo
2015-01-01
This paper presents a parallel implementation of a kind of discrete Fourier transform (DFT): the vector-valued DFT. The vector-valued DFT is a novel tool to analyze the spectra of vector-valued discrete-time signals. This parallel implementation is developed in terms of a mathematical framework with a set of block matrix operations. These block matrix operations contribute to analysis, design, and implementation of parallel algorithms in multicore processors. In this work, an implementation and experimental investigation of the mathematical framework are performed using MATLAB with the Parallel Computing Toolbox. We found that there is advantage to use multicore processors and a parallel computing environment to minimize the high execution time. Additionally, speedup increases when the number of logical processors and length of the signal increase.
On parallel hybrid-electric propulsion system for unmanned aerial vehicles
NASA Astrophysics Data System (ADS)
Hung, J. Y.; Gonzalez, L. F.
2012-05-01
This paper presents a review of existing and current developments and the analysis of Hybrid-Electric Propulsion Systems (HEPS) for small fixed-wing Unmanned Aerial Vehicles (UAVs). Efficient energy utilisation on an UAV is essential to its functioning, often to achieve the operational goals of range, endurance and other specific mission requirements. Due to the limitations of the space available and the mass budget on the UAV, it is often a delicate balance between the onboard energy available (i.e. fuel) and achieving the operational goals. One technology with potential in this area is with the use of HEPS. In this paper, information on the state-of-art technology in this field of research is provided. A description and simulation of a parallel HEPS for a small fixed-wing UAV by incorporating an Ideal Operating Line (IOL) control strategy is described. Simulation models of the components in a HEPS were designed in the MATLAB Simulink environment. An IOL analysis of an UAV piston engine was used to determine the most efficient points of operation for this engine. The results show that an UAV equipped with this HEPS configuration is capable of achieving a fuel saving of 6.5%, compared to the engine-only configuration.
A Linguistic Model in Component Oriented Programming
NASA Astrophysics Data System (ADS)
Crăciunean, Daniel Cristian; Crăciunean, Vasile
2016-12-01
It is a fact that the component-oriented programming, well organized, can bring a large increase in efficiency in the development of large software systems. This paper proposes a model for building software systems by assembling components that can operate independently of each other. The model is based on a computing environment that runs parallel and distributed applications. This paper introduces concepts as: abstract aggregation scheme and aggregation application. Basically, an aggregation application is an application that is obtained by combining corresponding components. In our model an aggregation application is a word in a language.
Serial multiplier arrays for parallel computation
NASA Technical Reports Server (NTRS)
Winters, Kel
1990-01-01
Arrays of systolic serial-parallel multiplier elements are proposed as an alternative to conventional SIMD mesh serial adder arrays for applications that are multiplication intensive and require few stored operands. The design and operation of a number of multiplier and array configurations featuring locality of connection, modularity, and regularity of structure are discussed. A design methodology combining top-down and bottom-up techniques is described to facilitate development of custom high-performance CMOS multiplier element arrays as well as rapid synthesis of simulation models and semicustom prototype CMOS components. Finally, a differential version of NORA dynamic circuits requiring a single-phase uncomplemented clock signal introduced for this application.
Limpanuparb, Taweetham; Milthorpe, Josh; Rendell, Alistair P
2014-10-30
Use of the modern parallel programming language X10 for computing long-range Coulomb and exchange interactions is presented. By using X10, a partitioned global address space language with support for task parallelism and the explicit representation of data locality, the resolution of the Ewald operator can be parallelized in a straightforward manner including use of both intranode and internode parallelism. We evaluate four different schemes for dynamic load balancing of integral calculation using X10's work stealing runtime, and report performance results for long-range HF energy calculation of large molecule/high quality basis running on up to 1024 cores of a high performance cluster machine. Copyright © 2014 Wiley Periodicals, Inc.
NASA Astrophysics Data System (ADS)
Lin, Mingpei; Xu, Ming; Fu, Xiaoyu
2017-05-01
Currently, a tremendous amount of space debris in Earth's orbit imperils operational spacecraft. It is essential to undertake risk assessments of collisions and predict dangerous encounters in space. However, collision predictions for an enormous amount of space debris give rise to large-scale computations. In this paper, a parallel algorithm is established on the Compute Unified Device Architecture (CUDA) platform of NVIDIA Corporation for collision prediction. According to the parallel structure of NVIDIA graphics processors, a block decomposition strategy is adopted in the algorithm. Space debris is divided into batches, and the computation and data transfer operations of adjacent batches overlap. As a consequence, the latency to access shared memory during the entire computing process is significantly reduced, and a higher computing speed is reached. Theoretically, a simulation of collision prediction for space debris of any amount and for any time span can be executed. To verify this algorithm, a simulation example including 1382 pieces of debris, whose operational time scales vary from 1 min to 3 days, is conducted on Tesla C2075 of NVIDIA. The simulation results demonstrate that with the same computational accuracy as that of a CPU, the computing speed of the parallel algorithm on a GPU is 30 times that on a CPU. Based on this algorithm, collision prediction of over 150 Chinese spacecraft for a time span of 3 days can be completed in less than 3 h on a single computer, which meets the timeliness requirement of the initial screening task. Furthermore, the algorithm can be adapted for multiple tasks, including particle filtration, constellation design, and Monte-Carlo simulation of an orbital computation.
Archer, Charles J [Rochester, MN; Blocksome, Michael A [Rochester, MN; Peters, Amanda A [Rochester, MN; Ratterman, Joseph D [Rochester, MN; Smith, Brian E [Rochester, MN
2012-01-10
Methods, apparatus, and products are disclosed for reducing power consumption while synchronizing a plurality of compute nodes during execution of a parallel application that include: beginning, by each compute node, performance of a blocking operation specified by the parallel application, each compute node beginning the blocking operation asynchronously with respect to the other compute nodes; reducing, for each compute node, power to one or more hardware components of that compute node in response to that compute node beginning the performance of the blocking operation; and restoring, for each compute node, the power to the hardware components having power reduced in response to all of the compute nodes beginning the performance of the blocking operation.
Archer, Charles J [Rochester, MN; Blocksome, Michael A [Rochester, MN; Peters, Amanda E [Cambridge, MA; Ratterman, Joseph D [Rochester, MN; Smith, Brian E [Rochester, MN
2012-04-17
Methods, apparatus, and products are disclosed for reducing power consumption while synchronizing a plurality of compute nodes during execution of a parallel application that include: beginning, by each compute node, performance of a blocking operation specified by the parallel application, each compute node beginning the blocking operation asynchronously with respect to the other compute nodes; reducing, for each compute node, power to one or more hardware components of that compute node in response to that compute node beginning the performance of the blocking operation; and restoring, for each compute node, the power to the hardware components having power reduced in response to all of the compute nodes beginning the performance of the blocking operation.
He, Fu-Liang; Wang, Lei; Yue, Zhen-Dong; Zhao, Hong-Wei; Liu, Fu-Quan
2014-09-07
To evaluate the feasibility of a second parallel transjugular intrahepatic portosystemic shunt (TIPS) to reduce portal venous pressure and control complications of portal hypertension. From January 2011 to December 2012, 10 cirrhotic patients were treated for complications of portal hypertension. The demographic data, operative data, postoperative recovery data, hemodynamic data, and complications were analyzed. Ten patients underwent a primary and parallel TIPS. Technical success rate was 100% with no technical complications. The mean duration of the first operation was 89.20 ± 29.46 min and the second operation was 57.0 ± 12.99 min. The mean portal system pressure decreased from 54.80 ± 4.16 mmHg to 39.0 ± 3.20 mmHg after the primary TIPS and from 44.40 ± 3.95 mmHg to 26.10 ± 4.07 mmHg after the parallel TIPS creation. The mean portosystemic pressure gradient decreased from 43.80 ± 6.18 mmHg to 31.90 ± 2.85 mmHg after the primary TIPS and from 35.60 ± 2.72 mmHg to 15.30 ± 3.27 mmHg after the parallel TIPS creation. Clinical improvement was seen in all patients after the parallel TIPS creation. One patient suffered from transient grade I hepatic encephalopathy (HE) after the primary TIPS and four patients experienced transient grade I-II after the parallel TIPS procedure. Mean hospital stay after the first and second operations were 15.0 ± 3.71 d and 16.90 ± 5.11 d (P = 0.014), respectively. After a mean 14.0 ± 3.13 mo follow-up, ascites and bleeding were well controlled and no stenosis of the stents was found. Parallel TIPS is an effective approach for controlling portal hypertension complications.
Multitasking TORT under UNICOS: Parallel performance models and measurements
DOE Office of Scientific and Technical Information (OSTI.GOV)
Barnett, A.; Azmy, Y.Y.
1999-09-27
The existing parallel algorithms in the TORT discrete ordinates code were updated to function in a UNICOS environment. A performance model for the parallel overhead was derived for the existing algorithms. The largest contributors to the parallel overhead were identified and a new algorithm was developed. A parallel overhead model was also derived for the new algorithm. The results of the comparison of parallel performance models were compared to applications of the code to two TORT standard test problems and a large production problem. The parallel performance models agree well with the measured parallel overhead.
Multitasking TORT Under UNICOS: Parallel Performance Models and Measurements
DOE Office of Scientific and Technical Information (OSTI.GOV)
Azmy, Y.Y.; Barnett, D.A.
1999-09-27
The existing parallel algorithms in the TORT discrete ordinates were updated to function in a UNI-COS environment. A performance model for the parallel overhead was derived for the existing algorithms. The largest contributors to the parallel overhead were identified and a new algorithm was developed. A parallel overhead model was also derived for the new algorithm. The results of the comparison of parallel performance models were compared to applications of the code to two TORT standard test problems and a large production problem. The parallel performance models agree well with the measured parallel overhead.
Novel Door-opening Method for Six-legged Robots Based on Only Force Sensing
NASA Astrophysics Data System (ADS)
Chen, Zhi-Jun; Gao, Feng; Pan, Yang
2017-09-01
Current door-opening methods are mainly developed on tracked, wheeled and biped robots by applying multi-DOF manipulators and vision systems. However, door-opening methods for six-legged robots are seldom studied, especially using 0-DOF tools to operate and only force sensing to detect. A novel door-opening method for six-legged robots is developed and implemented to the six-parallel-legged robot. The kinematic model of the six-parallel-legged robot is established and the model of measuring the positional relationship between the robot and the door is proposed. The measurement model is completely based on only force sensing. The real-time trajectory planning method and the control strategy are designed. The trajectory planning method allows the maximum angle between the sagittal axis of the robot body and the normal line of the door plane to be 45º. A 0-DOF tool mounted to the robot body is applied to operate. By integrating with the body, the tool has 6 DOFs and enough workspace to operate. The loose grasp achieved by the tool helps release the inner force in the tool. Experiments are carried out to validate the method. The results show that the method is effective and robust in opening doors wider than 1 m. This paper proposes a novel door-opening method for six-legged robots, which notably uses a 0-DOF tool and only force sensing to detect and open the door.
2015-01-01
Implementing parallel and multivalued logic operations at the molecular scale has the potential to improve the miniaturization and efficiency of a new generation of nanoscale computing devices. Two-dimensional photon-echo spectroscopy is capable of resolving dynamical pathways on electronic and vibrational molecular states. We experimentally demonstrate the implementation of molecular decision trees, logic operations where all possible values of inputs are processed in parallel and the outputs are read simultaneously, by probing the laser-induced dynamics of populations and coherences in a rhodamine dye mounted on a short DNA duplex. The inputs are provided by the bilinear interactions between the molecule and the laser pulses, and the output values are read from the two-dimensional molecular response at specific frequencies. Our results highlights how ultrafast dynamics between multiple molecular states induced by light–matter interactions can be used as an advantage for performing complex logic operations in parallel, operations that are faster than electrical switching. PMID:25984269
The Fight Deck Perspective of the NASA Langley AILS Concept
NASA Technical Reports Server (NTRS)
Rine, Laura L.; Abbott, Terence S.; Lohr, Gary W.; Elliott, Dawn M.; Waller, Marvin C.; Perry, R. Brad
2000-01-01
Many US airports depend on parallel runway operations to meet the growing demand for day to day operations. In the current airspace system, Instrument Meteorological Conditions (IMC) reduce the capacity of close parallel runway operations; that is, runways spaced closer than 4300 ft. These capacity losses can result in landing delays causing inconveniences to the traveling public, interruptions in commerce, and increased operating costs to the airlines. This document presents the flight deck perspective component of the Airborne Information for Lateral Spacing (AILS) approaches to close parallel runways in IMC. It represents the ideas the NASA Langley Research Center (LaRC) AILS Development Team envisions to integrate a number of components and procedures into a workable system for conducting close parallel runway approaches. An initial documentation of the aspects of this concept was sponsored by LaRC and completed in 1996. Since that time a number of the aspects have evolved to a more mature state. This paper is an update of the earlier documentation.
NASA Technical Reports Server (NTRS)
Peterson, Victor L.; Kim, John; Holst, Terry L.; Deiwert, George S.; Cooper, David M.; Watson, Andrew B.; Bailey, F. Ron
1992-01-01
Report evaluates supercomputer needs of five key disciplines: turbulence physics, aerodynamics, aerothermodynamics, chemistry, and mathematical modeling of human vision. Predicts these fields will require computer speed greater than 10(Sup 18) floating-point operations per second (FLOP's) and memory capacity greater than 10(Sup 15) words. Also, new parallel computer architectures and new structured numerical methods will make necessary speed and capacity available.
ERIC Educational Resources Information Center
Klemen, Jane; Buchel, Christian; Buhler, Mira; Menz, Mareike M.; Rose, Michael
2010-01-01
Attentional interference between tasks performed in parallel is known to have strong and often undesired effects. As yet, however, the mechanisms by which interference operates remain elusive. A better knowledge of these processes may facilitate our understanding of the effects of attention on human performance and the debilitating consequences…
A distributed parallel storage architecture and its potential application within EOSDIS
NASA Technical Reports Server (NTRS)
Johnston, William E.; Tierney, Brian; Feuquay, Jay; Butzer, Tony
1994-01-01
We describe the architecture, implementation, use of a scalable, high performance, distributed-parallel data storage system developed in the ARPA funded MAGIC gigabit testbed. A collection of wide area distributed disk servers operate in parallel to provide logical block level access to large data sets. Operated primarily as a network-based cache, the architecture supports cooperation among independently owned resources to provide fast, large-scale, on-demand storage to support data handling, simulation, and computation.
Terminal Area Procedures for Paired Runways
NASA Technical Reports Server (NTRS)
Lozito, Sandra; Verma, Savita Arora
2011-01-01
Parallel runway operations have been found to increase capacity within the National Airspace but poor visibility conditions reduce the use of these operations. The NextGen and SESAR Programs have identified the capacity benefits from increased use of closely-space parallel runway. Previous research examined the concepts and procedures related to parallel runways however, there has been no investigation of the procedures associated with the strategic and tactical pairing of aircraft for these operations. This simulation study developed and examined the pilot and controller procedures and information requirements for creating aircraft pairs for parallel runway operations. The goal was to achieve aircraft pairing with a temporal separation of 15s (+/- 10s error) at a coupling point that was about 12 nmi from the runway threshold. Two variables were explored for the pilot participants: two levels of flight deck automation (current-day flight deck automation and auto speed control future automation) as well as two flight deck displays that assisted in pilot conformance monitoring. The controllers were also provided with automation to help create and maintain aircraft pairs. Results show the operations in this study were acceptable and safe. Subjective workload, when using the pairing procedures and tools, was generally low for both controllers and pilots, and situation awareness was typically moderate to high. Pilot workload was influenced by display type and automation condition. Further research on pairing and off-nominal conditions is required however, this investigation identified promising findings about the feasibility of closely-spaced parallel runway operations.
Performance of the SERI parallel-passage dehumidifer
DOE Office of Scientific and Technical Information (OSTI.GOV)
Schlepp, D.; Barlow, R.
1984-09-01
The key component in improving the performance of solar desiccant cooling systems is the dehumidifier. A parallel-passage geometry for the desiccant dehumidifier has been identified as meeting key criteria of low pressure drop, high mass transfer efficiency, and compact size. An experimental program to build and test a small-scale prototype of this design was undertaken in FY 1982, and the results are presented in this report. Computer models to predict the adsorption/desorption behavior of desiccant dehumidifiers were updated to take into account the geometry of the bed and predict potential system performance using the new component design. The parallel-passage designmore » proved to have high mass transfer effectiveness and low pressure drop over a wide range of test conditions typical of desiccant cooling system operation. The prototype dehumidifier averaged 93% effectiveness at pressure drops of less than 50 Pa at design point conditions. Predictions of system performance using models validated with the experimental data indicate that system thermal coefficients of performance (COPs) of 1.0 to 1.2 and electrical COPs above 8.5 are possible using this design.« less
Xray: N-dimensional, labeled arrays for analyzing physical datasets in Python
NASA Astrophysics Data System (ADS)
Hoyer, S.
2015-12-01
Efficient analysis of geophysical datasets requires tools that both preserve and utilize metadata, and that transparently scale to process large datas. Xray is such a tool, in the form of an open source Python library for analyzing the labeled, multi-dimensional array (tensor) datasets that are ubiquitous in the Earth sciences. Xray's approach pairs Python data structures based on the data model of the netCDF file format with the proven design and user interface of pandas, the popular Python data analysis library for labeled tabular data. On top of the NumPy array, xray adds labeled dimensions (e.g., "time") and coordinate values (e.g., "2015-04-10"), which it uses to enable a host of operations powered by these labels: selection, aggregation, alignment, broadcasting, split-apply-combine, interoperability with pandas and serialization to netCDF/HDF5. Many of these operations are enabled by xray's tight integration with pandas. Finally, to allow for easy parallelism and to enable its labeled data operations to scale to datasets that does not fit into memory, xray integrates with the parallel processing library dask.
A unifying framework for systems modeling, control systems design, and system operation
NASA Technical Reports Server (NTRS)
Dvorak, Daniel L.; Indictor, Mark B.; Ingham, Michel D.; Rasmussen, Robert D.; Stringfellow, Margaret V.
2005-01-01
Current engineering practice in the analysis and design of large-scale multi-disciplinary control systems is typified by some form of decomposition- whether functional or physical or discipline-based-that enables multiple teams to work in parallel and in relative isolation. Too often, the resulting system after integration is an awkward marriage of different control and data mechanisms with poor end-to-end accountability. System of systems engineering, which faces this problem on a large scale, cries out for a unifying framework to guide analysis, design, and operation. This paper describes such a framework based on a state-, model-, and goal-based architecture for semi-autonomous control systems that guides analysis and modeling, shapes control system software design, and directly specifies operational intent. This paper illustrates the key concepts in the context of a large-scale, concurrent, globally distributed system of systems: NASA's proposed Array-based Deep Space Network.
Evaluation of Proteus as a Tool for the Rapid Development of Models of Hydrologic Systems
NASA Astrophysics Data System (ADS)
Weigand, T. M.; Farthing, M. W.; Kees, C. E.; Miller, C. T.
2013-12-01
Models of modern hydrologic systems can be complex and involve a variety of operators with varying character. The goal is to implement approximations of such models that are both efficient for the developer and computationally efficient, which is a set of naturally competing objectives. Proteus is a Python-based toolbox that supports prototyping of model formulations as well as a wide variety of modern numerical methods and parallel computing. We used Proteus to develop numerical approximations for three models: Richards' equation, a brine flow model derived using the Thermodynamically Constrained Averaging Theory (TCAT), and a multiphase TCAT-based tumor growth model. For Richards' equation, we investigated discontinuous Galerkin solutions with higher order time integration based on the backward difference formulas. The TCAT brine flow model was implemented using Proteus and a variety of numerical methods were compared to hand coded solutions. Finally, an existing tumor growth model was implemented in Proteus to introduce more advanced numerics and allow the code to be run in parallel. From these three example models, Proteus was found to be an attractive open-source option for rapidly developing high quality code for solving existing and evolving computational science models.
Verification and Planning Based on Coinductive Logic Programming
NASA Technical Reports Server (NTRS)
Bansal, Ajay; Min, Richard; Simon, Luke; Mallya, Ajay; Gupta, Gopal
2008-01-01
Coinduction is a powerful technique for reasoning about unfounded sets, unbounded structures, infinite automata, and interactive computations [6]. Where induction corresponds to least fixed point's semantics, coinduction corresponds to greatest fixed point semantics. Recently coinduction has been incorporated into logic programming and an elegant operational semantics developed for it [11, 12]. This operational semantics is the greatest fix point counterpart of SLD resolution (SLD resolution imparts operational semantics to least fix point based computations) and is termed co- SLD resolution. In co-SLD resolution, a predicate goal p( t) succeeds if it unifies with one of its ancestor calls. In addition, rational infinite terms are allowed as arguments of predicates. Infinite terms are represented as solutions to unification equations and the occurs check is omitted during the unification process. Coinductive Logic Programming (Co-LP) and Co-SLD resolution can be used to elegantly perform model checking and planning. A combined SLD and Co-SLD resolution based LP system forms the common basis for planning, scheduling, verification, model checking, and constraint solving [9, 4]. This is achieved by amalgamating SLD resolution, co-SLD resolution, and constraint logic programming [13] in a single logic programming system. Given that parallelism in logic programs can be implicitly exploited [8], complex, compute-intensive applications (planning, scheduling, model checking, etc.) can be executed in parallel on multi-core machines. Parallel execution can result in speed-ups as well as in larger instances of the problems being solved. In the remainder we elaborate on (i) how planning can be elegantly and efficiently performed under real-time constraints, (ii) how real-time systems can be elegantly and efficiently model- checked, as well as (iii) how hybrid systems can be verified in a combined system with both co-SLD and SLD resolution. Implementations of co-SLD resolution as well as preliminary implementations of the planning and verification applications have been developed [4]. Co-LP and Model Checking: The vast majority of properties that are to be verified can be classified into safety properties and liveness properties. It is well known within model checking that safety properties can be verified by reachability analysis, i.e, if a counter-example to the property exists, it can be finitely determined by enumerating all the reachable states of the Kripke structure.
An early warning indicator for atmospheric blocking events using transfer operators
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tantet, Alexis, E-mail: a.j.j.tantet@uu.nl; Burgt, Fiona R. van der; Dijkstra, Henk A.
The existence of persistent midlatitude atmospheric flow regimes with time-scales larger than 5–10 days and indications of preferred transitions between them motivates to develop early warning indicators for such regime transitions. In this paper, we use a hemispheric barotropic model together with estimates of transfer operators on a reduced phase space to develop an early warning indicator of the zonal to blocked flow transition in this model. It is shown that the spectrum of the transfer operators can be used to study the slow dynamics of the flow as well as the non-Markovian character of the reduction. The slowest motionsmore » are thereby found to have time scales of three to six weeks and to be associated with meta-stable regimes (and their transitions) which can be detected as almost-invariant sets of the transfer operator. From the energy budget of the model, we are able to explain the meta-stability of the regimes and the existence of preferred transition paths. Even though the model is highly simplified, the skill of the early warning indicator is promising, suggesting that the transfer operator approach can be used in parallel to an operational deterministic model for stochastic prediction or to assess forecast uncertainty.« less
Stem thrust prediction model for W-K-M double wedge parallel expanding gate valves
DOE Office of Scientific and Technical Information (OSTI.GOV)
Eldiwany, B.; Alvarez, P.D.; Wolfe, K.
1996-12-01
An analytical model for determining the required valve stem thrust during opening and closing strokes of W-K-M parallel expanding gate valves was developed as part of the EPRI Motor-Operated Valve Performance Prediction Methodology (EPRI MOV PPM) Program. The model was validated against measured stem thrust data obtained from in-situ testing of three W-K-M valves. Model predictions show favorable, bounding agreement with the measured data for valves with Stellite 6 hardfacing on the disks and seat rings for water flow in the preferred flow direction (gate downstream). The maximum required thrust to open and to close the valve (excluding wedging andmore » unwedging forces) occurs at a slightly open position and not at the fully closed position. In the nonpreferred flow direction, the model shows that premature wedging can occur during {Delta}P closure strokes even when the coefficients of friction at different sliding surfaces are within the typical range. This paper summarizes the model description and comparison against test data.« less
Rapid code acquisition algorithms employing PN matched filters
NASA Technical Reports Server (NTRS)
Su, Yu T.
1988-01-01
The performance of four algorithms using pseudonoise matched filters (PNMFs), for direct-sequence spread-spectrum systems, is analyzed. They are: parallel search with fix dwell detector (PL-FDD), parallel search with sequential detector (PL-SD), parallel-serial search with fix dwell detector (PS-FDD), and parallel-serial search with sequential detector (PS-SD). The operation characteristic for each detector and the mean acquisition time for each algorithm are derived. All the algorithms are studied in conjunction with the noncoherent integration technique, which enables the system to operate in the presence of data modulation. Several previous proposals using PNMF are seen as special cases of the present algorithms.
Robinson, Thomas N; Jones, Edward L; Dunn, Christina L; Dunne, Bruce; Johnson, Elizabeth; Townsend, Nicole T; Paniccia, Alessandro; Stiegmann, Greg V
2015-06-01
The monopolar "Bovie" is used in virtually every laparoscopic operation. The active electrode and its cord emit radiofrequency energy that couples (or transfers) to nearby conductive material without direct contact. This phenomenon is increased when the active electrode cord is oriented parallel to another wire/cord. The parallel orientation of the "Bovie" and laparoscopic camera cords cause transfer of energy to the camera cord resulting in cutaneous burns at the camera trocar incision. We hypothesized that separating the active electrode/camera cords would reduce thermal injury occurring at the camera trocar incision in comparison to parallel oriented active electrode/camera cords. In this prospective, blinded, randomized controlled trial, patients undergoing standardized laparoscopic cholecystectomy were randomized to separated active electrode/camera cords or parallel oriented active electrode/camera cords. The primary outcome variable was thermal injury determined by histology from skin biopsied at the camera trocar incision. Eighty-four patients participated. Baseline demographics were similar in the groups for age, sex, preoperative diagnosis, operative time, and blood loss. Thermal injury at the camera trocar incision was lower in the separated versus parallel group (31% vs 57%; P = 0.027). Separation of the laparoscopic camera cord from the active electrode cord decreases thermal injury from antenna coupling at the camera trocar incision in comparison to the parallel orientation of these cords. Therefore, parallel orientation of these cords (an arrangement promoted by integrated operating rooms) should be abandoned. The findings of this study should influence the operating room setup for all laparoscopic cases.
NASA Technical Reports Server (NTRS)
Cassell, Rick; Smith, Alex; Connors, Mary; Wojciech, Jack; Rosekind, Mark R. (Technical Monitor)
1996-01-01
As new technologies and procedures are introduced into the National Airspace System, whether they are intended to improve efficiency, capacity, or safety level, the quantification of potential changes in safety levels is of vital concern. Applications of technology can improve safety levels and allow the reduction of separation standards. An excellent example is the Precision Runway Monitor (PRM). By taking advantage of the surveillance and display advances of PRM, airports can run instrument parallel approaches to runways separated by 3400 feet with the same level of safety as parallel approaches to runways separated by 4300 feet using the standard technology. Despite a wealth of information from flight operations and testing programs, there is no readily quantifiable relationship between numerical safety levels and the separation standards that apply to aircraft on final approach. This paper presents a modeling approach to quantify the risk associated with reducing separation on final approach. Reducing aircraft separation, both laterally and longitudinally, has been the goal of several aviation R&D programs over the past several years. Many of these programs have focused on technological solutions to improve navigation accuracy, surveillance accuracy, aircraft situational awareness, controller situational awareness, and other technical and operational factors that are vital to maintaining flight safety. The risk assessment model relates different types of potential aircraft accidents and incidents and their contribution to overall accident risk. The framework links accident risks to a hierarchy of failsafe mechanisms characterized by procedures and interventions. The model will be used to assess the overall level of safety associated with reducing separation standards and the introduction of new technology and procedures, as envisaged under the Free Flight concept. The model framework can be applied to various aircraft scenarios, including parallel and in-trail approaches. This research was performed under contract to NASA and in cooperation with the FAA's Safety Division (ASY).
NASA Technical Reports Server (NTRS)
Jost, Gabriele; Labarta, Jesus; Gimenez, Judit
2004-01-01
With the current trend in parallel computer architectures towards clusters of shared memory symmetric multi-processors, parallel programming techniques have evolved that support parallelism beyond a single level. When comparing the performance of applications based on different programming paradigms, it is important to differentiate between the influence of the programming model itself and other factors, such as implementation specific behavior of the operating system (OS) or architectural issues. Rewriting-a large scientific application in order to employ a new programming paradigms is usually a time consuming and error prone task. Before embarking on such an endeavor it is important to determine that there is really a gain that would not be possible with the current implementation. A detailed performance analysis is crucial to clarify these issues. The multilevel programming paradigms considered in this study are hybrid MPI/OpenMP, MLP, and nested OpenMP. The hybrid MPI/OpenMP approach is based on using MPI [7] for the coarse grained parallelization and OpenMP [9] for fine grained loop level parallelism. The MPI programming paradigm assumes a private address space for each process. Data is transferred by explicitly exchanging messages via calls to the MPI library. This model was originally designed for distributed memory architectures but is also suitable for shared memory systems. The second paradigm under consideration is MLP which was developed by Taft. The approach is similar to MPi/OpenMP, using a mix of coarse grain process level parallelization and loop level OpenMP parallelization. As it is the case with MPI, a private address space is assumed for each process. The MLP approach was developed for ccNUMA architectures and explicitly takes advantage of the availability of shared memory. A shared memory arena which is accessible by all processes is required. Communication is done by reading from and writing to the shared memory.
Numerical Prediction of Non-Reacting and Reacting Flow in a Model Gas Turbine Combustor
NASA Technical Reports Server (NTRS)
Davoudzadeh, Farhad; Liu, Nan-Suey
2005-01-01
The three-dimensional, viscous, turbulent, reacting and non-reacting flow characteristics of a model gas turbine combustor operating on air/methane are simulated via an unstructured and massively parallel Reynolds-Averaged Navier-Stokes (RANS) code. This serves to demonstrate the capabilities of the code for design and analysis of real combustor engines. The effects of some design features of combustors are examined. In addition, the computed results are validated against experimental data.
Adaptive parallel logic networks
NASA Technical Reports Server (NTRS)
Martinez, Tony R.; Vidal, Jacques J.
1988-01-01
Adaptive, self-organizing concurrent systems (ASOCS) that combine self-organization with massive parallelism for such applications as adaptive logic devices, robotics, process control, and system malfunction management, are presently discussed. In ASOCS, an adaptive network composed of many simple computing elements operating in combinational and asynchronous fashion is used and problems are specified by presenting if-then rules to the system in the form of Boolean conjunctions. During data processing, which is a different operational phase from adaptation, the network acts as a parallel hardware circuit.
Hypercluster Parallel Processor
NASA Technical Reports Server (NTRS)
Blech, Richard A.; Cole, Gary L.; Milner, Edward J.; Quealy, Angela
1992-01-01
Hypercluster computer system includes multiple digital processors, operation of which coordinated through specialized software. Configurable according to various parallel-computing architectures of shared-memory or distributed-memory class, including scalar computer, vector computer, reduced-instruction-set computer, and complex-instruction-set computer. Designed as flexible, relatively inexpensive system that provides single programming and operating environment within which one can investigate effects of various parallel-computing architectures and combinations on performance in solution of complicated problems like those of three-dimensional flows in turbomachines. Hypercluster software and architectural concepts are in public domain.
Robot Acting on Moving Bodies (RAMBO): Interaction with tumbling objects
NASA Technical Reports Server (NTRS)
Davis, Larry S.; Dementhon, Daniel; Bestul, Thor; Ziavras, Sotirios; Srinivasan, H. V.; Siddalingaiah, Madhu; Harwood, David
1989-01-01
Interaction with tumbling objects will become more common as human activities in space expand. Attempting to interact with a large complex object translating and rotating in space, a human operator using only his visual and mental capacities may not be able to estimate the object motion, plan actions or control those actions. A robot system (RAMBO) equipped with a camera, which, given a sequence of simple tasks, can perform these tasks on a tumbling object, is being developed. RAMBO is given a complete geometric model of the object. A low level vision module extracts and groups characteristic features in images of the object. The positions of the object are determined in a sequence of images, and a motion estimate of the object is obtained. This motion estimate is used to plan trajectories of the robot tool to relative locations rearby the object sufficient for achieving the tasks. More specifically, low level vision uses parallel algorithms for image enhancement by symmetric nearest neighbor filtering, edge detection by local gradient operators, and corner extraction by sector filtering. The object pose estimation is a Hough transform method accumulating position hypotheses obtained by matching triples of image features (corners) to triples of model features. To maximize computing speed, the estimate of the position in space of a triple of features is obtained by decomposing its perspective view into a product of rotations and a scaled orthographic projection. This allows use of 2-D lookup tables at each stage of the decomposition. The position hypotheses for each possible match of model feature triples and image feature triples are calculated in parallel. Trajectory planning combines heuristic and dynamic programming techniques. Then trajectories are created using dynamic interpolations between initial and goal trajectories. All the parallel algorithms run on a Connection Machine CM-2 with 16K processors.
46 CFR 111.12-7 - Voltage regulation and parallel operation.
Code of Federal Regulations, 2013 CFR
2013-10-01
... 46 Shipping 4 2013-10-01 2013-10-01 false Voltage regulation and parallel operation. 111.12-7 Section 111.12-7 Shipping COAST GUARD, DEPARTMENT OF HOMELAND SECURITY (CONTINUED) ELECTRICAL ENGINEERING ELECTRIC SYSTEMS-GENERAL REQUIREMENTS Generator Construction and Circuits § 111.12-7 Voltage regulation and...
46 CFR 111.12-7 - Voltage regulation and parallel operation.
Code of Federal Regulations, 2014 CFR
2014-10-01
... 46 Shipping 4 2014-10-01 2014-10-01 false Voltage regulation and parallel operation. 111.12-7 Section 111.12-7 Shipping COAST GUARD, DEPARTMENT OF HOMELAND SECURITY (CONTINUED) ELECTRICAL ENGINEERING ELECTRIC SYSTEMS-GENERAL REQUIREMENTS Generator Construction and Circuits § 111.12-7 Voltage regulation and...
46 CFR 111.12-7 - Voltage regulation and parallel operation.
Code of Federal Regulations, 2012 CFR
2012-10-01
... 46 Shipping 4 2012-10-01 2012-10-01 false Voltage regulation and parallel operation. 111.12-7 Section 111.12-7 Shipping COAST GUARD, DEPARTMENT OF HOMELAND SECURITY (CONTINUED) ELECTRICAL ENGINEERING ELECTRIC SYSTEMS-GENERAL REQUIREMENTS Generator Construction and Circuits § 111.12-7 Voltage regulation and...
46 CFR 111.12-7 - Voltage regulation and parallel operation.
Code of Federal Regulations, 2011 CFR
2011-10-01
... 46 Shipping 4 2011-10-01 2011-10-01 false Voltage regulation and parallel operation. 111.12-7 Section 111.12-7 Shipping COAST GUARD, DEPARTMENT OF HOMELAND SECURITY (CONTINUED) ELECTRICAL ENGINEERING ELECTRIC SYSTEMS-GENERAL REQUIREMENTS Generator Construction and Circuits § 111.12-7 Voltage regulation and...
McElree, Brian; Carrasco, Marisa
2012-01-01
Feature and conjunction searches have been argued to delineate parallel and serial operations in visual processing. The authors evaluated this claim by examining the temporal dynamics of the detection of features and conjunctions. The 1st experiment used a reaction time (RT) task to replicate standard mean RT patterns and to examine the shapes of the RT distributions. The 2nd experiment used the response-signal speed–accuracy trade-off (SAT) procedure to measure discrimination (asymptotic detection accuracy) and detection speed (processing dynamics). Set size affected discrimination in both feature and conjunction searches but affected detection speed only in the latter. Fits of models to the SAT data that included a serial component overpredicted the magnitude of the observed dynamics differences. The authors concluded that both features and conjunctions are detected in parallel. Implications for the role of attention in visual processing are discussed. PMID:10641310
The Goddard Space Flight Center Program to develop parallel image processing systems
NASA Technical Reports Server (NTRS)
Schaefer, D. H.
1972-01-01
Parallel image processing which is defined as image processing where all points of an image are operated upon simultaneously is discussed. Coherent optical, noncoherent optical, and electronic methods are considered parallel image processing techniques.
On the Use of Kronecker Operators for the Solution of Generalized Stochastic Petri Nets
NASA Technical Reports Server (NTRS)
Ciardo, Gianfranco; Tilgner, Marco
1996-01-01
We discuss how to describe the Markov chain underlying a generalized stochastic Petri net using Kronecker operators on smaller matrices. We extend previous approaches by allowing both an extensive type of marking-dependent behavior for the transitions and the presence of immediate synchronizations. The derivation of the results is thoroughly formalized, including the use of Kronecker operators in the treatment of the vanishing markings and the computation of impulse-based reward measures. We use our techniques to analyze a model whose solution using conventional methods would fail because of the state-space explosion. In the conclusion, we point out ideas to parallelize our approach.
Petri net controllers for distributed robotic systems
NASA Technical Reports Server (NTRS)
Lefebvre, D. R.; Saridis, George N.
1992-01-01
Petri nets are a well established modelling technique for analyzing parallel systems. When coupled with an event-driven operating system, Petri nets can provide an effective means for integrating and controlling the functions of distributed robotic applications. Recent work has shown that Petri net graphs can also serve as remarkably intuitive operator interfaces. In this paper, the advantages of using Petri nets as high-level controllers to coordinate robotic functions are outlined, the considerations for designing Petri net controllers are discussed, and simple Petri net structures for implementing an interface for operator supervision are presented. A detailed example is presented which illustrates these concepts for a sensor-based assembly application.
Mine Hoist Operator Training System. Phase I Report.
1978-11-01
Bodies of Knowledge Function Control speed of conveyances Hold conveyances in position Structural Components Types of brakes : * Disc * Drum - Jaw...Parallel motion Components of each type * Disc / drum * Pads/shoes * Operating mechanisms Operating mediums for braking * Hydraulic/pneumatic * Manual...SHAFT GUIDES Wood El BRAKES Steel Rails El Drum : Wire Rope: Jaw El Full Lock El Parallel Motion El Half Lock El Disc El LEVELS DRIVE MOTORS Single El
Velocity diagnostics of electron beams within a 140 GHz gyrotron
NASA Astrophysics Data System (ADS)
Polevoy, Jeffrey Todd
1989-06-01
Experimental measurements of the average axial velocity v(sub parallel) of the electron beam within the M.I.T. 140 GHz MW gyrotron have been performed. The method involves the simultaneous measurement of the radial electrostatic potential of the electron beam V(sub p) and the beam current I(sub b). The V(sub p) is measured through the use of a capacitive probe installed near or within the gyrotron cavity, while I(sub b) is measured with a previously installed Rogowski coil. Three capacitive probes have been designed and built, and two have operated within the gyrotron. The probe results are repeatable and consistent with theory. The measurements of v(sub parallel) and calculations of the corresponding transverse to longitudinal beam velocity ratio (alpha) = v(sub perpendicular)/v(sub parallel) at the cavity have been made at various gyrotron operation parameters. These measurements will provide insight into the causes of discrepancies between theoretical RF interaction efficiencies and experimental efficiencies obtained in experiments with the M.I.T. 140 GHz MW gyrotron. The expected values of v(sub parallel) and (alpha) are determined through the use of a computer code (EGUN) which is used to model the cathode and anode regions of the gyrotron. It also computes the trajectories and velocities of the electrons within the gyrotron. There is good correlation between the expected and measured values of (alpha) at low (alpha), with the expected values from EGUN often falling within the standard errors of the measured values.
I/O Parallelization for the Goddard Earth Observing System Data Assimilation System (GEOS DAS)
NASA Technical Reports Server (NTRS)
Lucchesi, Rob; Sawyer, W.; Takacs, L. L.; Lyster, P.; Zero, J.
1998-01-01
The National Aeronautics and Space Administration (NASA) Data Assimilation Office (DAO) at the Goddard Space Flight Center (GSFC) has developed the GEOS DAS, a data assimilation system that provides production support for NASA missions and will support NASA's Earth Observing System (EOS) in the coming years. The GEOS DAS will be used to provide background fields of meteorological quantities to EOS satellite instrument teams for use in their data algorithms as well as providing assimilated data sets for climate studies on decadal time scales. The DAO has been involved in prototyping parallel implementations of the GEOS DAS for a number of years and is now embarking on an effort to convert the production version from shared-memory parallelism to distributed-memory parallelism using the portable Message-Passing Interface (MPI). The GEOS DAS consists of two main components, an atmospheric General Circulation Model (GCM) and a Physical-space Statistical Analysis System (PSAS). The GCM operates on data that are stored on a regular grid while PSAS works with observational data that are scattered irregularly throughout the atmosphere. As a result, the two components have different data decompositions. The GCM is decomposed horizontally as a checkerboard with all vertical levels of each box existing on the same processing element(PE). The dynamical core of the GCM can also operate on a rotated grid, which requires communication-intensive grid transformations during GCM integration. PSAS groups observations on PEs in a more irregular and dynamic fashion.
Interactive Visualization of DGA Data Based on Multiple Views
NASA Astrophysics Data System (ADS)
Geng, Yujie; Lin, Ying; Ma, Yan; Guo, Zhihong; Gu, Chao; Wang, Mingtao
2017-01-01
The commission and operation of dissolved gas analysis (DGA) online monitoring makes up for the weakness of traditional DGA method. However, volume and high-dimensional DGA data brings a huge challenge for monitoring and analysis. In this paper, we present a novel interactive visualization model of DGA data based on multiple views. This model imitates multi-angle analysis by combining parallel coordinates, scatter plot matrix and data table. By offering brush, collaborative filter and focus + context technology, this model provides a convenient and flexible interactive way to analyze and understand the DGA data.
Control of Wind Tunnel Operations Using Neural Net Interpretation of Flow Visualization Records
NASA Technical Reports Server (NTRS)
Buggele, Alvin E.; Decker, Arthur J.
1994-01-01
Neural net control of operations in a small subsonic/transonic/supersonic wind tunnel at Lewis Research Center is discussed. The tunnel and the layout for neural net control or control by other parallel processing techniques are described. The tunnel is an affordable, multiuser platform for testing instrumentation and components, as well as parallel processing and control strategies. Neural nets have already been tested on archival schlieren and holographic visualizations from this tunnel as well as recent supersonic and transonic shadowgraph. This paper discusses the performance of neural nets for interpreting shadowgraph images in connection with a recent exercise for tuning the tunnel in a subsonic/transonic cascade mode of operation. That mode was operated for performing wake surveys in connection with NASA's Advanced Subsonic Technology (AST) noise reduction program. The shadowgraph was presented to the neural nets as 60 by 60 pixel arrays. The outputs were tunnel parameters such as valve settings or tunnel state identifiers for selected tunnel operating points, conditions, or states. The neural nets were very sensitive, perhaps too sensitive, to shadowgraph pattern detail. However, the nets exhibited good immunity to variations in brightness, to noise, and to changes in contrast. The nets are fast enough so that ten or more can be combined per control operation to interpret flow visualization data, point sensor data, and model calculations. The pattern sensitivity of the nets will be utilized and tested to control wind tunnel operations at Mach 2.0 based on shock wave patterns.
Using Abstraction in Explicity Parallel Programs.
1991-07-01
However, we only rely on sequential consistency of memory operations. includ- ing reads. writes and any synchronization primitives provided by the...explicit synchronization primitives . This demonstrates the practical power of sequentially consistent memory, as opposed to weaker models of memory that...a small set of synchronization primitives , all pro- cedures have non-waiting specifications. This is in contrast to richer process-oriented
ERIC Educational Resources Information Center
Costa, Ann Marie
2012-01-01
A recent law in a New England state allowed public schools to operate with increased flexibility and autonomy through the authorization of the creation of Innovation Schools. This project study, a program evaluation using a convergent parallel mixed methods research design, allowed for a comprehensive evaluation of the first Innovation School…
Experimental characterization of a binary actuated parallel manipulator
NASA Astrophysics Data System (ADS)
Giuseppe, Carbone
2016-05-01
This paper describes the BAPAMAN (Binary Actuated Parallel MANipulator) series of parallel manipulators that has been conceived at Laboratory of Robotics and Mechatronics (LARM). Basic common characteristics of BAPAMAN series are described. In particular, it is outlined the use of a reduced number of active degrees of freedom, the use of design solutions with flexural joints and Shape Memory Alloy (SMA) actuators for achieving miniaturization, cost reduction and easy operation features. Given the peculiarities of BAPAMAN architecture, specific experimental tests have been proposed and carried out with the aim to validate the proposed design and to evaluate the practical operation performance and the characteristics of a built prototype, in particular, in terms of operation and workspace characteristics.
NASA Astrophysics Data System (ADS)
Olson, Richard F.
2013-05-01
Rendering of point scatterer based radar scenes for millimeter wave (mmW) seeker tests in real-time hardware-in-the-loop (HWIL) scene generation requires efficient algorithms and vector-friendly computer architectures for complex signal synthesis. New processor technology from Intel implements an extended 256-bit vector SIMD instruction set (AVX, AVX2) in a multi-core CPU design providing peak execution rates of hundreds of GigaFLOPS (GFLOPS) on one chip. Real world mmW scene generation code can approach peak SIMD execution rates only after careful algorithm and source code design. An effective software design will maintain high computing intensity emphasizing register-to-register SIMD arithmetic operations over data movement between CPU caches or off-chip memories. Engineers at the U.S. Army Aviation and Missile Research, Development and Engineering Center (AMRDEC) applied two basic parallel coding methods to assess new 256-bit SIMD multi-core architectures for mmW scene generation in HWIL. These include use of POSIX threads built on vector library functions and more portable, highlevel parallel code based on compiler technology (e.g. OpenMP pragmas and SIMD autovectorization). Since CPU technology is rapidly advancing toward high processor core counts and TeraFLOPS peak SIMD execution rates, it is imperative that coding methods be identified which produce efficient and maintainable parallel code. This paper describes the algorithms used in point scatterer target model rendering, the parallelization of those algorithms, and the execution performance achieved on an AVX multi-core machine using the two basic parallel coding methods. The paper concludes with estimates for scale-up performance on upcoming multi-core technology.
Exploring types of play in an adapted robotics program for children with disabilities.
Lindsay, Sally; Lam, Ashley
2018-04-01
Play is an important occupation in a child's development. Children with disabilities often have fewer opportunities to engage in meaningful play than typically developing children. The purpose of this study was to explore the types of play (i.e., solitary, parallel and co-operative) within an adapted robotics program for children with disabilities aged 6-8 years. This study draws on detailed observations of each of the six robotics workshops and interviews with 53 participants (21 children, 21 parents and 11 programme staff). Our findings showed that four children engaged in solitary play, where all but one showed signs of moving towards parallel play. Six children demonstrated parallel play during all workshops. The remainder of the children had mixed play types play (solitary, parallel and/or co-operative) throughout the robotics workshops. We observed more parallel and co-operative, and less solitary play as the programme progressed. Ten different children displayed co-operative behaviours throughout the workshops. The interviews highlighted how staff supported children's engagement in the programme. Meanwhile, parents reported on their child's development of play skills. An adapted LEGO ® robotics program has potential to develop the play skills of children with disabilities in moving from solitary towards more parallel and co-operative play. Implications for rehabilitation Educators and clinicians working with children who have disabilities should consider the potential of LEGO ® robotics programs for developing their play skills. Clinicians should consider how the extent of their involvement in prompting and facilitating children's engagement and play within a robotics program may influence their ability to interact with their peers. Educators and clinicians should incorporate both structured and unstructured free-play elements within a robotics program to facilitate children's social development.
FPGA implementation of current-sharing strategy for parallel-connected SEPICs
NASA Astrophysics Data System (ADS)
Ezhilarasi, A.; Ramaswamy, M.
2016-01-01
The attempt echoes to evolve an equal current-sharing algorithm over a number of single-ended primary inductance converters connected in parallel. The methodology involves the development of state-space model to predict the condition for the existence of a stable equilibrium portrait. It acquires the role of a variable structure controller to guide the trajectory, with a view to circumvent the circuit non-linearities and arrive at a stable performance through a preferred operating range. The design elicits an acceptable servo and regulatory characteristics, the desired time response and ensures regulation of the load voltage. The simulation results validated through a field programmable gate array-based prototype serves to illustrate its suitability for present-day applications.
NASA Technical Reports Server (NTRS)
Gorospe, George E., Jr.; Daigle, Matthew J.; Sankararaman, Shankar; Kulkarni, Chetan S.; Ng, Eley
2017-01-01
Prognostic methods enable operators and maintainers to predict the future performance for critical systems. However, these methods can be computationally expensive and may need to be performed each time new information about the system becomes available. In light of these computational requirements, we have investigated the application of graphics processing units (GPUs) as a computational platform for real-time prognostics. Recent advances in GPU technology have reduced cost and increased the computational capability of these highly parallel processing units, making them more attractive for the deployment of prognostic software. We present a survey of model-based prognostic algorithms with considerations for leveraging the parallel architecture of the GPU and a case study of GPU-accelerated battery prognostics with computational performance results.
The ASC Sequoia Programming Model
DOE Office of Scientific and Technical Information (OSTI.GOV)
Seager, M
2008-08-06
In the late 1980's and early 1990's, Lawrence Livermore National Laboratory was deeply engrossed in determining the next generation programming model for the Integrated Design Codes (IDC) beyond vectorization for the Cray 1s series of computers. The vector model, developed in mid 1970's first for the CDC 7600 and later extended from stack based vector operation to memory to memory operations for the Cray 1s, lasted approximately 20 years (See Slide 5). The Cray vector era was deemed an extremely long lived era as it allowed vector codes to be developed over time (the Cray 1s were faster in scalarmore » mode than the CDC 7600) with vector unit utilization increasing incrementally over time. The other attributes of the Cray vector era at LLNL were that we developed, supported and maintained the Operating System (LTSS and later NLTSS), communications protocols (LINCS), Compilers (Civic Fortran77 and Model), operating system tools (e.g., batch system, job control scripting, loaders, debuggers, editors, graphics utilities, you name it) and math and highly machine optimized libraries (e.g., SLATEC, and STACKLIB). Although LTSS was adopted by Cray for early system generations, they later developed COS and UNICOS operating systems and environment on their own. In the late 1970s and early 1980s two trends appeared that made the Cray vector programming model (described above including both the hardware and system software aspects) seem potentially dated and slated for major revision. These trends were the appearance of low cost CMOS microprocessors and their attendant, departmental and mini-computers and later workstations and personal computers. With the wide spread adoption of Unix in the early 1980s, it appeared that LLNL (and the other DOE Labs) would be left out of the mainstream of computing without a rapid transition to these 'Killer Micros' and modern OS and tools environments. The other interesting advance in the period is that systems were being developed with multiple 'cores' in them and called Symmetric Multi-Processor or Shared Memory Processor (SMP) systems. The parallel revolution had begun. The Laboratory started a small 'parallel processing project' in 1983 to study the new technology and its application to scientific computing with four people: Tim Axelrod, Pete Eltgroth, Paul Dubois and Mark Seager. Two years later, Eugene Brooks joined the team. This team focused on Unix and 'killer micro' SMPs. Indeed, Eugene Brooks was credited with coming up with the 'Killer Micro' term. After several generations of SMP platforms (e.g., Sequent Balance 8000 with 8 33MHz MC32032s, Allian FX8 with 8 MC68020 and FPGA based Vector Units and finally the BB&N Butterfly with 128 cores), it became apparent to us that the killer micro revolution would indeed take over Crays and that we definitely needed a new programming and systems model. The model developed by Mark Seager and Dale Nielsen focused on both the system aspects (Slide 3) and the code development aspects (Slide 4). Although now succinctly captured in two attached slides, at the time there was tremendous ferment in the research community as to what parallel programming model would emerge, dominate and survive. In addition, we wanted a model that would provide portability between platforms of a single generation but also longevity over multiple--and hopefully--many generations. Only after we developed the 'Livermore Model' and worked it out in considerable detail did it become obvious that what we came up with was the right approach. In a nutshell, the applications programming model of the Livermore Model posited that SMP parallelism would ultimately not scale indefinitely and one would have to bite the bullet and implement MPI parallelism within the Integrated Design Code (IDC). We also had a major emphasis on doing everything in a completely standards based, portable methodology with POSIX/Unix as the target environment. We decided against specialized libraries like STACKLIB for performance, but kept as many general purpose, portable math libraries as were needed by the codes. Third, we assumed that the SMPs in clusters would evolve in time to become more powerful, feature rich and, in particular, offer more cores. Thus, we focused on OpenMP, and POSIX PThreads for programming SMP parallelism. These code porting efforts were lead by Dale Nielsen, A-Division code group leader, and Randy Christensen, B-Division code group leader. Most of the porting effort revolved removing 'Crayisms' in the codes: artifacts of LTSS/NLTSS, Civic compiler extensions beyond Fortran77, IO libraries and dealing with new code control languages (we switched to Perl and later to Python). Adding MPI to the codes was initially problematic and error prone because the programmers used MPI directly and sprinkled the calls throughout the code.« less
Fast, Massively Parallel Data Processors
NASA Technical Reports Server (NTRS)
Heaton, Robert A.; Blevins, Donald W.; Davis, ED
1994-01-01
Proposed fast, massively parallel data processor contains 8x16 array of processing elements with efficient interconnection scheme and options for flexible local control. Processing elements communicate with each other on "X" interconnection grid with external memory via high-capacity input/output bus. This approach to conditional operation nearly doubles speed of various arithmetic operations.
Hybrid parallel computing architecture for multiview phase shifting
NASA Astrophysics Data System (ADS)
Zhong, Kai; Li, Zhongwei; Zhou, Xiaohui; Shi, Yusheng; Wang, Congjun
2014-11-01
The multiview phase-shifting method shows its powerful capability in achieving high resolution three-dimensional (3-D) shape measurement. Unfortunately, this ability results in very high computation costs and 3-D computations have to be processed offline. To realize real-time 3-D shape measurement, a hybrid parallel computing architecture is proposed for multiview phase shifting. In this architecture, the central processing unit can co-operate with the graphic processing unit (GPU) to achieve hybrid parallel computing. The high computation cost procedures, including lens distortion rectification, phase computation, correspondence, and 3-D reconstruction, are implemented in GPU, and a three-layer kernel function model is designed to simultaneously realize coarse-grained and fine-grained paralleling computing. Experimental results verify that the developed system can perform 50 fps (frame per second) real-time 3-D measurement with 260 K 3-D points per frame. A speedup of up to 180 times is obtained for the performance of the proposed technique using a NVIDIA GT560Ti graphics card rather than a sequential C in a 3.4 GHZ Inter Core i7 3770.
A Review of Lightweight Thread Approaches for High Performance Computing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Castello, Adrian; Pena, Antonio J.; Seo, Sangmin
High-level, directive-based solutions are becoming the programming models (PMs) of the multi/many-core architectures. Several solutions relying on operating system (OS) threads perfectly work with a moderate number of cores. However, exascale systems will spawn hundreds of thousands of threads in order to exploit their massive parallel architectures and thus conventional OS threads are too heavy for that purpose. Several lightweight thread (LWT) libraries have recently appeared offering lighter mechanisms to tackle massive concurrency. In order to examine the suitability of LWTs in high-level runtimes, we develop a set of microbenchmarks consisting of commonlyfound patterns in current parallel codes. Moreover, wemore » study the semantics offered by some LWT libraries in order to expose the similarities between different LWT application programming interfaces. This study reveals that a reduced set of LWT functions can be sufficient to cover the common parallel code patterns and that those LWT libraries perform better than OS threads-based solutions in cases where task and nested parallelism are becoming more popular with new architectures.« less
Integrated microfluidic devices for combinatorial cell-based assays.
Yu, Zeta Tak For; Kamei, Ken-ichiro; Takahashi, Hiroko; Shu, Chengyi Jenny; Wang, Xiaopu; He, George Wenfu; Silverman, Robert; Radu, Caius G; Witte, Owen N; Lee, Ki-Bum; Tseng, Hsian-Rong
2009-06-01
The development of miniaturized cell culture platforms for performing parallel cultures and combinatorial assays is important in cell biology from the single-cell level to the system level. In this paper we developed an integrated microfluidic cell-culture platform, Cell-microChip (Cell-microChip), for parallel analyses of the effects of microenvironmental cues (i.e., culture scaffolds) on different mammalian cells and their cellular responses to external stimuli. As a model study, we demonstrated the ability of culturing and assaying several mammalian cells, such as NIH 3T3 fibroblast, B16 melanoma and HeLa cell lines, in a parallel way. For functional assays, first we tested drug-induced apoptotic responses from different cell lines. As a second functional assay, we performed "on-chip" transfection of a reporter gene encoding an enhanced green fluorescent protein (EGFP) followed by live-cell imaging of transcriptional activation of cyclooxygenase 2 (Cox-2) expression. Collectively, our Cell-microChip approach demonstrated the capability to carry out parallel operations and the potential to further integrate advanced functions and applications in the broader space of combinatorial chemistry and biology.
Integrated microfluidic devices for combinatorial cell-based assays
Yu, Zeta Tak For; Kamei, Ken-ichiro; Takahashi, Hiroko; Shu, Chengyi Jenny; Wang, Xiaopu; He, George Wenfu; Silverman, Robert
2010-01-01
The development of miniaturized cell culture platforms for performing parallel cultures and combinatorial assays is important in cell biology from the single-cell level to the system level. In this paper we developed an integrated microfluidic cell-culture platform, Cell-microChip (Cell-μChip), for parallel analyses of the effects of microenvir-onmental cues (i.e., culture scaffolds) on different mammalian cells and their cellular responses to external stimuli. As a model study, we demonstrated the ability of culturing and assaying several mammalian cells, such as NIH 3T3 fibro-blast, B16 melanoma and HeLa cell lines, in a parallel way. For functional assays, first we tested drug-induced apoptotic responses from different cell lines. As a second functional assay, we performed "on-chip" transfection of a reporter gene encoding an enhanced green fluorescent protein (EGFP) followed by live-cell imaging of transcriptional activation of cyclooxygenase 2 (Cox-2) expression. Collectively, our Cell-μChip approach demonstrated the capability to carry out parallel operations and the potential to further integrate advanced functions and applications in the broader space of combinatorial chemistry and biology. PMID:19130244
Parallel Semi-Implicit Spectral Element Atmospheric Model
NASA Astrophysics Data System (ADS)
Fournier, A.; Thomas, S.; Loft, R.
2001-05-01
The shallow-water equations (SWE) have long been used to test atmospheric-modeling numerical methods. The SWE contain essential wave-propagation and nonlinear effects of more complete models. We present a semi-implicit (SI) improvement of the Spectral Element Atmospheric Model to solve the SWE (SEAM, Taylor et al. 1997, Fournier et al. 2000, Thomas & Loft 2000). SE methods are h-p finite element methods combining the geometric flexibility of size-h finite elements with the accuracy of degree-p spectral methods. Our work suggests that exceptional parallel-computation performance is achievable by a General-Circulation-Model (GCM) dynamical core, even at modest climate-simulation resolutions (>1o). The code derivation involves weak variational formulation of the SWE, Gauss(-Lobatto) quadrature over the collocation points, and Legendre cardinal interpolators. Appropriate weak variation yields a symmetric positive-definite Helmholtz operator. To meet the Ladyzhenskaya-Babuska-Brezzi inf-sup condition and avoid spurious modes, we use a staggered grid. The SI scheme combines leapfrog and Crank-Nicholson schemes for the nonlinear and linear terms respectively. The localization of operations to elements ideally fits the method to cache-based microprocessor computer architectures --derivatives are computed as collections of small (8x8), naturally cache-blocked matrix-vector products. SEAM also has desirable boundary-exchange communication, like finite-difference models. Timings on on the IBM SP and Compaq ES40 supercomputers indicate that the SI code (20-min timestep) requires 1/3 the CPU time of the explicit code (2-min timestep) for T42 resolutions. Both codes scale nearly linearly out to 400 processors. We achieved single-processor performance up to 30% of peak for both codes on the 375-MHz IBM Power-3 processors. Fast computation and linear scaling lead to a useful climate-simulation dycore only if enough model time is computed per unit wall-clock time. An efficient SI solver is essential to substantially increase this rate. Parallel preconditioning for an iterative conjugate-gradient elliptic solver is described. We are building a GCM dycore capable of 200 GF% lOPS sustained performance on clustered RISC/cache architectures using hybrid MPI/OpenMP programming.
Optimal expression evaluation for data parallel architectures
NASA Technical Reports Server (NTRS)
Gilbert, John R.; Schreiber, Robert
1990-01-01
A data parallel machine represents an array or other composite data structure by allocating one processor (at least conceptually) per data item. A pointwise operation can be performed between two such arrays in unit time, provided their corresponding elements are allocated in the same processors. If the arrays are not aligned in this fashion, the cost of moving one or both of them is part of the cost of the operation. The choice of where to perform the operation then affects this cost. If an expression with several operands is to be evaluated, there may be many choices of where to perform the intermediate operations. An efficient algorithm is given to find the minimum-cost way to evaluate an expression, for several different data parallel architectures. This algorithm applies to any architecture in which the metric describing the cost of moving an array is robust. This encompasses most of the common data parallel communication architectures, including meshes of arbitrary dimension and hypercubes. Remarks are made on several variations of the problem, some of which are solved and some of which remain open.
Zeki, Semir
2016-10-01
Results from a variety of sources, some many years old, lead ineluctably to a re-appraisal of the twin strategies of hierarchical and parallel processing used by the brain to construct an image of the visual world. Contrary to common supposition, there are at least three 'feed-forward' anatomical hierarchies that reach the primary visual cortex (V1) and the specialized visual areas outside it, in parallel. These anatomical hierarchies do not conform to the temporal order with which visual signals reach the specialized visual areas through V1. Furthermore, neither the anatomical hierarchies nor the temporal order of activation through V1 predict the perceptual hierarchies. The latter shows that we see (and become aware of) different visual attributes at different times, with colour leading form (orientation) and directional visual motion, even though signals from fast-moving, high-contrast stimuli are among the earliest to reach the visual cortex (of area V5). Parallel processing, on the other hand, is much more ubiquitous than commonly supposed but is subject to a barely noticed but fundamental aspect of brain operations, namely that different parallel systems operate asynchronously with respect to each other and reach perceptual endpoints at different times. This re-assessment leads to the conclusion that the visual brain is constituted of multiple, parallel and asynchronously operating task- and stimulus-dependent hierarchies (STDH); which of these parallel anatomical hierarchies have temporal and perceptual precedence at any given moment is stimulus and task related, and dependent on the visual brain's ability to undertake multiple operations asynchronously. © 2016 Federation of European Neuroscience Societies and John Wiley & Sons Ltd.
Data for polarization in charmless B{yields}{phi}K*: A signal for new physics?
DOE Office of Scientific and Technical Information (OSTI.GOV)
Das, Prasanta Kumar; Yang, K.-C.
2005-05-01
The recent observations of sizable transverse fractions of B{yields}{phi}K* may hint for the existence of new physics. We analyze all possible new-physics four-quark operators and find that two classes of new-physics operators could offer resolutions to the B{yields}{phi}K* polarization anomaly. The operators in the first class have structures (1-{gamma}{sub 5})x(1-{gamma}{sub 5}), {sigma}(1-{gamma}{sub 5})x{sigma}(1-{gamma}{sub 5}), and in the second class (1+{gamma}{sub 5})x(1+{gamma}{sub 5}), {sigma}(1+{gamma}{sub 5})x{sigma}(1+{gamma}{sub 5}). For each class, the new-physics effects can be lumped into a single parameter. Two possible experimental results of polarization phases, arg(A{sub perpendicular})-arg(A{sub parallel}){approx_equal}{pi} or 0, originating from the phase ambiguity in data, could be separatelymore » accounted for by our two new-physics scenarios: the first (second) scenario with the first (second) class new-physics operators. The consistency between the data and our new-physics analysis suggests a small new-physics weak phase, together with a large(r) strong phase. We obtain sizable transverse fractions {lambda}{sub parallel{sub parallel}}+{lambda}{sub perpendicular{sub perpendicular}}{approx_equal}{lambda}{sub 00}, in accordance with the observations. We find {lambda}{sub parallel{sub parallel}}{approx_equal}0.8{lambda}{sub perpendicular{sub perpendicular}} in the first scenario but {lambda}{sub parallel{sub parallel}} > or approx. {lambda}{sub perpendicular{sub perpendicular}} in the second scenario. We discuss the impact of the new-physics weak phase on observations.« less
A DNA-based semantic fusion model for remote sensing data.
Sun, Heng; Weng, Jian; Yu, Guangchuang; Massawe, Richard H
2013-01-01
Semantic technology plays a key role in various domains, from conversation understanding to algorithm analysis. As the most efficient semantic tool, ontology can represent, process and manage the widespread knowledge. Nowadays, many researchers use ontology to collect and organize data's semantic information in order to maximize research productivity. In this paper, we firstly describe our work on the development of a remote sensing data ontology, with a primary focus on semantic fusion-driven research for big data. Our ontology is made up of 1,264 concepts and 2,030 semantic relationships. However, the growth of big data is straining the capacities of current semantic fusion and reasoning practices. Considering the massive parallelism of DNA strands, we propose a novel DNA-based semantic fusion model. In this model, a parallel strategy is developed to encode the semantic information in DNA for a large volume of remote sensing data. The semantic information is read in a parallel and bit-wise manner and an individual bit is converted to a base. By doing so, a considerable amount of conversion time can be saved, i.e., the cluster-based multi-processes program can reduce the conversion time from 81,536 seconds to 4,937 seconds for 4.34 GB source data files. Moreover, the size of result file recording DNA sequences is 54.51 GB for parallel C program compared with 57.89 GB for sequential Perl. This shows that our parallel method can also reduce the DNA synthesis cost. In addition, data types are encoded in our model, which is a basis for building type system in our future DNA computer. Finally, we describe theoretically an algorithm for DNA-based semantic fusion. This algorithm enables the process of integration of the knowledge from disparate remote sensing data sources into a consistent, accurate, and complete representation. This process depends solely on ligation reaction and screening operations instead of the ontology.
A DNA-Based Semantic Fusion Model for Remote Sensing Data
Sun, Heng; Weng, Jian; Yu, Guangchuang; Massawe, Richard H.
2013-01-01
Semantic technology plays a key role in various domains, from conversation understanding to algorithm analysis. As the most efficient semantic tool, ontology can represent, process and manage the widespread knowledge. Nowadays, many researchers use ontology to collect and organize data's semantic information in order to maximize research productivity. In this paper, we firstly describe our work on the development of a remote sensing data ontology, with a primary focus on semantic fusion-driven research for big data. Our ontology is made up of 1,264 concepts and 2,030 semantic relationships. However, the growth of big data is straining the capacities of current semantic fusion and reasoning practices. Considering the massive parallelism of DNA strands, we propose a novel DNA-based semantic fusion model. In this model, a parallel strategy is developed to encode the semantic information in DNA for a large volume of remote sensing data. The semantic information is read in a parallel and bit-wise manner and an individual bit is converted to a base. By doing so, a considerable amount of conversion time can be saved, i.e., the cluster-based multi-processes program can reduce the conversion time from 81,536 seconds to 4,937 seconds for 4.34 GB source data files. Moreover, the size of result file recording DNA sequences is 54.51 GB for parallel C program compared with 57.89 GB for sequential Perl. This shows that our parallel method can also reduce the DNA synthesis cost. In addition, data types are encoded in our model, which is a basis for building type system in our future DNA computer. Finally, we describe theoretically an algorithm for DNA-based semantic fusion. This algorithm enables the process of integration of the knowledge from disparate remote sensing data sources into a consistent, accurate, and complete representation. This process depends solely on ligation reaction and screening operations instead of the ontology. PMID:24116207
Circumferential distortion modeling of the TF30-P-3 compression system
NASA Technical Reports Server (NTRS)
Mazzawy, R. S.; Banks, G. A.
1977-01-01
Circumferential inlet pressure and temperature distortion testing of the TF30 P-3 turbofan engine was conducted. The compressor system at the test conditions run was modelled according to a multiple segment parallel compressor model. Aspects of engine operation and distortion configuration modelled include the effects of compressor bleeds, relative pressure-temperature distortion alignment and circumferential distortion extent. Model predictions for limiting distortion amplitudes and flow distributions within the compression system were compared with test results in order to evaluate predicted trends. Relatively good agreement was obtained. The model also identified the low pressure compressor as the stall-initiating component, which was in agreement with the data.
An evidential reasoning extension to quantitative model-based failure diagnosis
NASA Technical Reports Server (NTRS)
Gertler, Janos J.; Anderson, Kenneth C.
1992-01-01
The detection and diagnosis of failures in physical systems characterized by continuous-time operation are studied. A quantitative diagnostic methodology has been developed that utilizes the mathematical model of the physical system. On the basis of the latter, diagnostic models are derived each of which comprises a set of orthogonal parity equations. To improve the robustness of the algorithm, several models may be used in parallel, providing potentially incomplete and/or conflicting inferences. Dempster's rule of combination is used to integrate evidence from the different models. The basic probability measures are assigned utilizing quantitative information extracted from the mathematical model and from online computation performed therewith.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Godwin, Aaron
The scope will be limited to analyzing the effect of the EFC within the system and how one improperly installed coupling affects the rest of the HPFL system. The discussion will include normal operations, impaired flow, and service interruptions. Normal operations are defined as two-way flow to buildings. Impaired operations are defined as a building that only has one-way flow being provided to the building. Service interruptions will be when a building does not have water available to it. The project will look at the following aspects of the reliability of the HPFL system: mean time to failure (MTTF) ofmore » EFCs, mean time between failures (MTBF), series system models, and parallel system models. These calculations will then be used to discuss the reliability of the system when one of the couplings fails. Compare the reliability of two-way feeds versus one-way feeds.« less
Parallel computing techniques for rotorcraft aerodynamics
NASA Astrophysics Data System (ADS)
Ekici, Kivanc
The modification of unsteady three-dimensional Navier-Stokes codes for application on massively parallel and distributed computing environments is investigated. The Euler/Navier-Stokes code TURNS (Transonic Unsteady Rotor Navier-Stokes) was chosen as a test bed because of its wide use by universities and industry. For the efficient implementation of TURNS on parallel computing systems, two algorithmic changes are developed. First, main modifications to the implicit operator, Lower-Upper Symmetric Gauss Seidel (LU-SGS) originally used in TURNS, is performed. Second, application of an inexact Newton method, coupled with a Krylov subspace iterative method (Newton-Krylov method) is carried out. Both techniques have been tried previously for the Euler equations mode of the code. In this work, we have extended the methods to the Navier-Stokes mode. Several new implicit operators were tried because of convergence problems of traditional operators with the high cell aspect ratio (CAR) grids needed for viscous calculations on structured grids. Promising results for both Euler and Navier-Stokes cases are presented for these operators. For the efficient implementation of Newton-Krylov methods to the Navier-Stokes mode of TURNS, efficient preconditioners must be used. The parallel implicit operators used in the previous step are employed as preconditioners and the results are compared. The Message Passing Interface (MPI) protocol has been used because of its portability to various parallel architectures. It should be noted that the proposed methodology is general and can be applied to several other CFD codes (e.g. OVERFLOW).
Runtime optimization of an application executing on a parallel computer
None
2014-11-25
Identifying a collective operation within an application executing on a parallel computer; identifying a call site of the collective operation; determining whether the collective operation is root-based; if the collective operation is not root-based: establishing a tuning session and executing the collective operation in the tuning session; if the collective operation is root-based, determining whether all compute nodes executing the application identified the collective operation at the same call site; if all compute nodes identified the collective operation at the same call site, establishing a tuning session and executing the collective operation in the tuning session; and if all compute nodes executing the application did not identify the collective operation at the same call site, executing the collective operation without establishing a tuning session.
Runtime optimization of an application executing on a parallel computer
Faraj, Daniel A; Smith, Brian E
2014-11-18
Identifying a collective operation within an application executing on a parallel computer; identifying a call site of the collective operation; determining whether the collective operation is root-based; if the collective operation is not root-based: establishing a tuning session and executing the collective operation in the tuning session; if the collective operation is root-based, determining whether all compute nodes executing the application identified the collective operation at the same call site; if all compute nodes identified the collective operation at the same call site, establishing a tuning session and executing the collective operation in the tuning session; and if all compute nodes executing the application did not identify the collective operation at the same call site, executing the collective operation without establishing a tuning session.
Runtime optimization of an application executing on a parallel computer
Faraj, Daniel A.; Smith, Brian E.
2013-01-29
Identifying a collective operation within an application executing on a parallel computer; identifying a call site of the collective operation; determining whether the collective operation is root-based; if the collective operation is not root-based: establishing a tuning session and executing the collective operation in the tuning session; if the collective operation is root-based, determining whether all compute nodes executing the application identified the collective operation at the same call site; if all compute nodes identified the collective operation at the same call site, establishing a tuning session and executing the collective operation in the tuning session; and if all compute nodes executing the application did not identify the collective operation at the same call site, executing the collective operation without establishing a tuning session.
PRAIS: Distributed, real-time knowledge-based systems made easy
NASA Technical Reports Server (NTRS)
Goldstein, David G.
1990-01-01
This paper discusses an architecture for real-time, distributed (parallel) knowledge-based systems called the Parallel Real-time Artificial Intelligence System (PRAIS). PRAIS strives for transparently parallelizing production (rule-based) systems, even when under real-time constraints. PRAIS accomplishes these goals by incorporating a dynamic task scheduler, operating system extensions for fact handling, and message-passing among multiple copies of CLIPS executing on a virtual blackboard. This distributed knowledge-based system tool uses the portability of CLIPS and common message-passing protocols to operate over a heterogeneous network of processors.
Air Traffic and Operational Data on Selected US Airports with Parallel Runways
NASA Technical Reports Server (NTRS)
Doyle, Thomas M.; McGee, Frank G.
1998-01-01
This report presents information on a number of airports in the country with parallel runways and focuses on those that have at least one pair of parallel runways closer than 4300 ft. Information contained in the report describes the airport's current operational activity as obtained through contact with the facility and from FAA air traffic tower activity data for FY 1997. The primary reason for this document is to provide a single source of information for research to determine airports where Airborne Information for Lateral Spacing (AILS) technology may be applicable.
Duan, Jizhong; Liu, Yu; Jing, Peiguang
2018-02-01
Self-consistent parallel imaging (SPIRiT) is an auto-calibrating model for the reconstruction of parallel magnetic resonance imaging, which can be formulated as a regularized SPIRiT problem. The Projection Over Convex Sets (POCS) method was used to solve the formulated regularized SPIRiT problem. However, the quality of the reconstructed image still needs to be improved. Though methods such as NonLinear Conjugate Gradients (NLCG) can achieve higher spatial resolution, these methods always demand very complex computation and converge slowly. In this paper, we propose a new algorithm to solve the formulated Cartesian SPIRiT problem with the JTV and JL1 regularization terms. The proposed algorithm uses the operator splitting (OS) technique to decompose the problem into a gradient problem and a denoising problem with two regularization terms, which is solved by our proposed split Bregman based denoising algorithm, and adopts the Barzilai and Borwein method to update step size. Simulation experiments on two in vivo data sets demonstrate that the proposed algorithm is 1.3 times faster than ADMM for datasets with 8 channels. Especially, our proposal is 2 times faster than ADMM for the dataset with 32 channels. Copyright © 2017 Elsevier Inc. All rights reserved.
Modelling of loading, stress relaxation and stress recovery in a shape memory polymer.
Sweeney, J; Bonner, M; Ward, I M
2014-09-01
A multi-element constitutive model for a lactide-based shape memory polymer has been developed that represents loading to large tensile deformations, stress relaxation and stress recovery at 60, 65 and 70°C. The model consists of parallel Maxwell arms each comprising neo-Hookean and Eyring elements. Guiu-Pratt analysis of the stress relaxation curves yields Eyring parameters. When these parameters are used to define the Eyring process in a single Maxwell arm, the resulting model yields at too low a stress, but gives good predictions for longer times. Stress dip tests show a very stiff response on unloading by a small strain decrement. This would create an unrealistically high stress on loading to large strain if it were modelled by an elastic element. Instead it is modelled by an Eyring process operating via a flow rule that introduces strain hardening after yield. When this process is incorporated into a second parallel Maxwell arm, there results a model that fully represents both stress relaxation and stress dip tests at 60°C. At higher temperatures a third arm is required for valid predictions. Crown Copyright © 2014. Published by Elsevier Ltd. All rights reserved.
Tempest: Accelerated MS/MS Database Search Software for Heterogeneous Computing Platforms.
Adamo, Mark E; Gerber, Scott A
2016-09-07
MS/MS database search algorithms derive a set of candidate peptide sequences from in silico digest of a protein sequence database, and compute theoretical fragmentation patterns to match these candidates against observed MS/MS spectra. The original Tempest publication described these operations mapped to a CPU-GPU model, in which the CPU (central processing unit) generates peptide candidates that are asynchronously sent to a discrete GPU (graphics processing unit) to be scored against experimental spectra in parallel. The current version of Tempest expands this model, incorporating OpenCL to offer seamless parallelization across multicore CPUs, GPUs, integrated graphics chips, and general-purpose coprocessors. Three protocols describe how to configure and run a Tempest search, including discussion of how to leverage Tempest's unique feature set to produce optimal results. © 2016 by John Wiley & Sons, Inc. Copyright © 2016 John Wiley & Sons, Inc.
NASA Technical Reports Server (NTRS)
Abbott, Terence S.
2011-01-01
This paper presents an overview of an algorithm specifically designed to support NASA's Airborne Precision Spacing concept. This airborne self-spacing concept is trajectory-based, allowing for spacing operations prior to the aircraft being on a common path. This implementation provides the ability to manage spacing against two traffic aircraft, with one of these aircraft operating to a parallel dependent runway. Because this algorithm is trajectory-based, it also has the inherent ability to support required-time-of-arrival (RTA) operations
Parallel design patterns for a low-power, software-defined compressed video encoder
NASA Astrophysics Data System (ADS)
Bruns, Michael W.; Hunt, Martin A.; Prasad, Durga; Gunupudi, Nageswara R.; Sonachalam, Sekar
2011-06-01
Video compression algorithms such as H.264 offer much potential for parallel processing that is not always exploited by the technology of a particular implementation. Consumer mobile encoding devices often achieve real-time performance and low power consumption through parallel processing in Application Specific Integrated Circuit (ASIC) technology, but many other applications require a software-defined encoder. High quality compression features needed for some applications such as 10-bit sample depth or 4:2:2 chroma format often go beyond the capability of a typical consumer electronics device. An application may also need to efficiently combine compression with other functions such as noise reduction, image stabilization, real time clocks, GPS data, mission/ESD/user data or software-defined radio in a low power, field upgradable implementation. Low power, software-defined encoders may be implemented using a massively parallel memory-network processor array with 100 or more cores and distributed memory. The large number of processor elements allow the silicon device to operate more efficiently than conventional DSP or CPU technology. A dataflow programming methodology may be used to express all of the encoding processes including motion compensation, transform and quantization, and entropy coding. This is a declarative programming model in which the parallelism of the compression algorithm is expressed as a hierarchical graph of tasks with message communication. Data parallel and task parallel design patterns are supported without the need for explicit global synchronization control. An example is described of an H.264 encoder developed for a commercially available, massively parallel memorynetwork processor device.
CICE, The Los Alamos Sea Ice Model
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hunke, Elizabeth; Lipscomb, William; Jones, Philip
The Los Alamos sea ice model (CICE) is the result of an effort to develop a computationally efficient sea ice component for a fully coupled atmosphere–land–ocean–ice global climate model. It was originally designed to be compatible with the Parallel Ocean Program (POP), an ocean circulation model developed at Los Alamos National Laboratory for use on massively parallel computers. CICE has several interacting components: a vertical thermodynamic model that computes local growth rates of snow and ice due to vertical conductive, radiative and turbulent fluxes, along with snowfall; an elastic-viscous-plastic model of ice dynamics, which predicts the velocity field of themore » ice pack based on a model of the material strength of the ice; an incremental remapping transport model that describes horizontal advection of the areal concentration, ice and snow volume and other state variables; and a ridging parameterization that transfers ice among thickness categories based on energetic balances and rates of strain. It also includes a biogeochemical model that describes evolution of the ice ecosystem. The CICE sea ice model is used for climate research as one component of complex global earth system models that include atmosphere, land, ocean and biogeochemistry components. It is also used for operational sea ice forecasting in the polar regions and in numerical weather prediction models.« less
Xu, Jingxiang; Higuchi, Yuji; Ozawa, Nobuki; Sato, Kazuhisa; Hashida, Toshiyuki; Kubo, Momoji
2017-09-20
Ni sintering in the Ni/YSZ porous anode of a solid oxide fuel cell changes the porous structure, leading to degradation. Preventing sintering and degradation during operation is a great challenge. Usually, a sintering molecular dynamics (MD) simulation model consisting of two particles on a substrate is used; however, the model cannot reflect the porous structure effect on sintering. In our previous study, a multi-nanoparticle sintering modeling method with tens of thousands of atoms revealed the effect of the particle framework and porosity on sintering. However, the method cannot reveal the effect of the particle size on sintering and the effect of sintering on the change in the porous structure. In the present study, we report a strategy to reveal them in the porous structure by using our multi-nanoparticle modeling method and a parallel large-scale multimillion-atom MD simulator. We used this method to investigate the effect of YSZ particle size and tortuosity on sintering and degradation in the Ni/YSZ anodes. Our parallel large-scale MD simulation showed that the sintering degree decreased as the YSZ particle size decreased. The gas fuel diffusion path, which reflects the overpotential, was blocked by pore coalescence during sintering. The degradation of gas diffusion performance increased as the YSZ particle size increased. Furthermore, the gas diffusion performance was quantified by a tortuosity parameter and an optimal YSZ particle size, which is equal to that of Ni, was found for good diffusion after sintering. These findings cannot be obtained by previous MD sintering studies with tens of thousands of atoms. The present parallel large-scale multimillion-atom MD simulation makes it possible to clarify the effects of the particle size and tortuosity on sintering and degradation.
Architecture Adaptive Computing Environment
NASA Technical Reports Server (NTRS)
Dorband, John E.
2006-01-01
Architecture Adaptive Computing Environment (aCe) is a software system that includes a language, compiler, and run-time library for parallel computing. aCe was developed to enable programmers to write programs, more easily than was previously possible, for a variety of parallel computing architectures. Heretofore, it has been perceived to be difficult to write parallel programs for parallel computers and more difficult to port the programs to different parallel computing architectures. In contrast, aCe is supportable on all high-performance computing architectures. Currently, it is supported on LINUX clusters. aCe uses parallel programming constructs that facilitate writing of parallel programs. Such constructs were used in single-instruction/multiple-data (SIMD) programming languages of the 1980s, including Parallel Pascal, Parallel Forth, C*, *LISP, and MasPar MPL. In aCe, these constructs are extended and implemented for both SIMD and multiple- instruction/multiple-data (MIMD) architectures. Two new constructs incorporated in aCe are those of (1) scalar and virtual variables and (2) pre-computed paths. The scalar-and-virtual-variables construct increases flexibility in optimizing memory utilization in various architectures. The pre-computed-paths construct enables the compiler to pre-compute part of a communication operation once, rather than computing it every time the communication operation is performed.
Boyle, Peter A.; Christ, Norman H.; Gara, Alan; Mawhinney, Robert D.; Ohmacht, Martin; Sugavanam, Krishnan
2012-12-11
A prefetch system improves a performance of a parallel computing system. The parallel computing system includes a plurality of computing nodes. A computing node includes at least one processor and at least one memory device. The prefetch system includes at least one stream prefetch engine and at least one list prefetch engine. The prefetch system operates those engines simultaneously. After the at least one processor issues a command, the prefetch system passes the command to a stream prefetch engine and a list prefetch engine. The prefetch system operates the stream prefetch engine and the list prefetch engine to prefetch data to be needed in subsequent clock cycles in the processor in response to the passed command.
ERIC Educational Resources Information Center
von Davier, Matthias
2016-01-01
This report presents results on a parallel implementation of the expectation-maximization (EM) algorithm for multidimensional latent variable models. The developments presented here are based on code that parallelizes both the E step and the M step of the parallel-E parallel-M algorithm. Examples presented in this report include item response…
Using the Statecharts paradigm for simulation of patient flow in surgical care.
Sobolev, Boris; Harel, David; Vasilakis, Christos; Levy, Adrian
2008-03-01
Computer simulation of patient flow has been used extensively to assess the impacts of changes in the management of surgical care. However, little research is available on the utility of existing modeling techniques. The purpose of this paper is to examine the capacity of Statecharts, a system of graphical specification, for constructing a discrete-event simulation model of the perioperative process. The Statecharts specification paradigm was originally developed for representing reactive systems by extending the formalism of finite-state machines through notions of hierarchy, parallelism, and event broadcasting. Hierarchy permits subordination between states so that one state may contain other states. Parallelism permits more than one state to be active at any given time. Broadcasting of events allows one state to detect changes in another state. In the context of the peri-operative process, hierarchy provides the means to describe steps within activities and to cluster related activities, parallelism provides the means to specify concurrent activities, and event broadcasting provides the means to trigger a series of actions in one activity according to transitions that occur in another activity. Combined with hierarchy and parallelism, event broadcasting offers a convenient way to describe the interaction of concurrent activities. We applied the Statecharts formalism to describe the progress of individual patients through surgical care as a series of asynchronous updates in patient records generated in reaction to events produced by parallel finite-state machines representing concurrent clinical and managerial activities. We conclude that Statecharts capture successfully the behavioral aspects of surgical care delivery by specifying permissible chronology of events, conditions, and actions.
A Tutorial on Parallel and Concurrent Programming in Haskell
NASA Astrophysics Data System (ADS)
Peyton Jones, Simon; Singh, Satnam
This practical tutorial introduces the features available in Haskell for writing parallel and concurrent programs. We first describe how to write semi-explicit parallel programs by using annotations to express opportunities for parallelism and to help control the granularity of parallelism for effective execution on modern operating systems and processors. We then describe the mechanisms provided by Haskell for writing explicitly parallel programs with a focus on the use of software transactional memory to help share information between threads. Finally, we show how nested data parallelism can be used to write deterministically parallel programs which allows programmers to use rich data types in data parallel programs which are automatically transformed into flat data parallel versions for efficient execution on multi-core processors.
NASA Technical Reports Server (NTRS)
Fijany, Amir (Inventor); Bejczy, Antal K. (Inventor)
1993-01-01
This is a real-time robotic controller and simulator which is a MIMD-SIMD parallel architecture for interfacing with an external host computer and providing a high degree of parallelism in computations for robotic control and simulation. It includes a host processor for receiving instructions from the external host computer and for transmitting answers to the external host computer. There are a plurality of SIMD microprocessors, each SIMD processor being a SIMD parallel processor capable of exploiting fine grain parallelism and further being able to operate asynchronously to form a MIMD architecture. Each SIMD processor comprises a SIMD architecture capable of performing two matrix-vector operations in parallel while fully exploiting parallelism in each operation. There is a system bus connecting the host processor to the plurality of SIMD microprocessors and a common clock providing a continuous sequence of clock pulses. There is also a ring structure interconnecting the plurality of SIMD microprocessors and connected to the clock for providing the clock pulses to the SIMD microprocessors and for providing a path for the flow of data and instructions between the SIMD microprocessors. The host processor includes logic for controlling the RRCS by interpreting instructions sent by the external host computer, decomposing the instructions into a series of computations to be performed by the SIMD microprocessors, using the system bus to distribute associated data among the SIMD microprocessors, and initiating activity of the SIMD microprocessors to perform the computations on the data by procedure call.
Design of a dataway processor for a parallel image signal processing system
NASA Astrophysics Data System (ADS)
Nomura, Mitsuru; Fujii, Tetsuro; Ono, Sadayasu
1995-04-01
Recently, demands for high-speed signal processing have been increasing especially in the field of image data compression, computer graphics, and medical imaging. To achieve sufficient power for real-time image processing, we have been developing parallel signal-processing systems. This paper describes a communication processor called 'dataway processor' designed for a new scalable parallel signal-processing system. The processor has six high-speed communication links (Dataways), a data-packet routing controller, a RISC CORE, and a DMA controller. Each communication link operates at 8-bit parallel in a full duplex mode at 50 MHz. Moreover, data routing, DMA, and CORE operations are processed in parallel. Therefore, sufficient throughput is available for high-speed digital video signals. The processor is designed in a top- down fashion using a CAD system called 'PARTHENON.' The hardware is fabricated using 0.5-micrometers CMOS technology, and its hardware is about 200 K gates.
NASA Technical Reports Server (NTRS)
Halpert, G.; Webb, D. A.
1983-01-01
Three batteries were operated in parallel from a common bus during charge and discharge. SMM utilized NASA Standard 20AH cells and batteries, and LANDSAT-D NASA 50AH cells and batteries of a similar design. Each battery consisted of 22 series connected cells providing the nominal 28V bus. The three batteries were charged in parallel using the voltage limit/current taper mode wherein the voltage limit was temperature compensated. Discharge occurred on the demand of the spacecraft instruments and electronics. Both flights were planned for three to five year missions. The series/parallel configuration of cells and batteries for the 3-5 yr mission required a well controlled product with built-in reliability and uniformity. Examples of how component, cell and battery selection methods affect the uniformity of the series/parallel operation of the batteries both in testing and in flight are given.
NASA Astrophysics Data System (ADS)
Wichert, Viktoria; Arkenberg, Mario; Hauschildt, Peter H.
2016-10-01
Highly resolved state-of-the-art 3D atmosphere simulations will remain computationally extremely expensive for years to come. In addition to the need for more computing power, rethinking coding practices is necessary. We take a dual approach by introducing especially adapted, parallel numerical methods and correspondingly parallelizing critical code passages. In the following, we present our respective work on PHOENIX/3D. With new parallel numerical algorithms, there is a big opportunity for improvement when iteratively solving the system of equations emerging from the operator splitting of the radiative transfer equation J = ΛS. The narrow-banded approximate Λ-operator Λ* , which is used in PHOENIX/3D, occurs in each iteration step. By implementing a numerical algorithm which takes advantage of its characteristic traits, the parallel code's efficiency is further increased and a speed-up in computational time can be achieved.
Yip, Hon Ming; Li, John C. S.; Cui, Xin; Gao, Qiannan; Leung, Chi Chiu
2014-01-01
As microfluidics has been applied extensively in many cell and biochemical applications, monitoring the related processes is an important requirement. In this work, we design and fabricate a high-throughput microfluidic device which contains 32 microchambers to perform automated parallel microfluidic operations and monitoring on an automated stage of a microscope. Images are captured at multiple spots on the device during the operations for monitoring samples in microchambers in parallel; yet the device positions may vary at different time points throughout operations as the device moves back and forth on a motorized microscopic stage. Here, we report an image-based positioning strategy to realign the chamber position before every recording of microscopic image. We fabricate alignment marks at defined locations next to the chambers in the microfluidic device as reference positions. We also develop image processing algorithms to recognize the chamber positions in real-time, followed by realigning the chambers to their preset positions in the captured images. We perform experiments to validate and characterize the device functionality and the automated realignment operation. Together, this microfluidic realignment strategy can be a platform technology to achieve precise positioning of multiple chambers for general microfluidic applications requiring long-term parallel monitoring of cell and biochemical activities. PMID:25133248
NASA Astrophysics Data System (ADS)
Zodiatis, George; Radhakrishnan, Hari; Lardner, Robin; Hayes, Daniel; Gertman, Isaac; Menna, Milena; Poulain, Pierre-Marie
2014-05-01
The general anticlockwise circulation along the coastline of the Eastern Mediterranean Levantine Basin was first proposed by Nielsen in 1912. Half a century later the schematic of the circulation in the area was enriched with sub-basin flow structures. In late 1980s, a more detailed picture of the circulation composed of eddies, gyres and coastal-offshore jets was defined during the POEM cruises. In 2005, Millot and Taupier-Letage have used SST satellite imagery to argue for a simpler pattern similar to the one proposed almost a century ago. During the last decade, renewed in-situ multi-platforms investigations under the framework of CYBO, CYCLOPS, NEMED, GROOM, HaiSec and PERSEUS projects, as well the development of the operational ocean forecasts and hindcasts in the framework of the MFS, ECOOP, MERSEA and MyOcean projects, have made possible to obtain an improved, higher spatial and temporal resolution picture of the circulation in the area. After some years of scientific disputes on the circulation pattern of the region, the new in-situ data sets and the operational numerical simulations confirm the relevant POEM results. The existing POM-based Cyprus Coastal Ocean Forecasting System (CYCOFOS), downscaling the MyOcean MFS, has been providing operational forecasts in the Eastern Mediterranean Levantine Basin region since early 2002. Recently, Radhakrishnan et al. (2012) parallelized the CYCOFOS hydrodynamic flow model using MPI to improve the accuracy of predictions while reducing the computational time. The parallel flow model is capable of modeling the Eastern Mediterranean Levantine Basin flow at a resolution of 500 m. The model was run in hindcast mode during which the innovations were computed using the historical data collected using gliders and cruises. Then, DD-OceanVar (D'Amore et al., 2013), a data assimilation tool based on 3DVAR developed by CMCC was used to compute the temperature and salinity field corrections. Numerical modeling results after the data assimilation will be presented.
NASA Astrophysics Data System (ADS)
Mariani, S.; Casaioli, M.; Lastoria, B.; Accadia, C.; Flavoni, S.
2009-04-01
The Institute for Environmental Protection and Research - ISPRA (former Agency for Environmental Protection and Technical Services - APAT) runs operationally since 2000 an integrated meteo-marine forecasting chain, named the Hydro-Meteo-Marine Forecasting System (Sistema Idro-Meteo-Mare - SIMM), formed by a cascade of four numerical models, telescoping from the Mediterranean basin to the Venice Lagoon, and initialized by means of analyses and forecasts from the European Centre for Medium-Range Weather Forecasts (ECMWF). The operational integrated system consists of a meteorological model, the parallel verision of BOlogna Limited Area Model (BOLAM), coupled over the Mediterranean sea with a WAve Model (WAM), a high-resolution shallow-water model of the Adriatic and Ionian Sea, namely the Princeton Ocean Model (POM), and a finite-element version of the same model (VL-FEM) on the Venice Lagoon, aimed to forecast the acqua alta events. Recently, the physically based, fully distributed, rainfall-runoff TOPographic Kinematic APproximation and Integration (TOPKAPI) model has been integrated into the system, coupled to BOLAM, over two river basins, located in the central and northeastern part of Italy, respectively. However, at the present time, this latter part of the forecasting chain is not operational and it is used in a research configuration. BOLAM was originally implemented in 2000 onto the Quadrics parallel supercomputer (and for this reason referred to as QBOLAM, as well) and only at the end of 2006 it was ported (together with the other operational marine models of the forecasting chain) onto the Silicon Graphics Inc. (SGI) Altix 8-processor machine. In particular, due to the Quadrics implementation, the Kuo scheme was formerly implemented into QBOLAM for the cumulus convection parameterization. On the contrary, when porting SIMM onto the Altix Linux cluster, it was achievable to implement into QBOLAM the more advanced convection parameterization by Kain and Fritsch. A fully updated serial version of the BOLAM code has been recently acquired. Code improvements include a more precise advection scheme (Weighted Average Flux); explicit advection of five hydrometeors, and state-of-the-art parameterization schemes for radiation, convection, boundary layer turbulence and soil processes (also with possible choice among different available schemes). The operational implementation of the new code into the SIMM model chain, which requires the development of a parallel version, will be achieved during 2009. In view of this goal, the comparative verification of the different model versions' skill represents a fundamental task. On this purpose, it has been decided to evaluate the performance improvement of the new BOLAM code (in the available serial version, hereinafter BOLAM 2007) with respect to the version with the Kain-Fritsch scheme (hereinafter KF version) and to the older one employing the Kuo scheme (hereinafter Kuo version). In the present work, verification of precipitation forecasts from the three BOLAM versions is carried on in a case study approach. The intense rainfall episode occurred on 10th - 17th December 2008 over Italy has been considered. This event produced indeed severe damages in Rome and its surrounding areas. Objective and subjective verification methods have been employed in order to evaluate model performance against an observational dataset including rain gauge observations and satellite imagery. Subjective comparison of observed and forecast precipitation fields is suitable to give an overall description of the forecast quality. Spatial errors (e.g., shifting and pattern errors) and rainfall volume error can be assessed quantitatively by means of object-oriented methods. By comparing satellite images with model forecast fields, it is possible to investigate the differences between the evolution of the observed weather system and the predicted ones, and its sensitivity to the improvements in the model code. Finally, the error in forecasting the cyclone evolution can be tentatively related with the precipitation forecast error.
3D multiphysics modeling of superconducting cavities with a massively parallel simulation suite
NASA Astrophysics Data System (ADS)
Kononenko, Oleksiy; Adolphsen, Chris; Li, Zenghai; Ng, Cho-Kuen; Rivetta, Claudio
2017-10-01
Radiofrequency cavities based on superconducting technology are widely used in particle accelerators for various applications. The cavities usually have high quality factors and hence narrow bandwidths, so the field stability is sensitive to detuning from the Lorentz force and external loads, including vibrations and helium pressure variations. If not properly controlled, the detuning can result in a serious performance degradation of a superconducting accelerator, so an understanding of the underlying detuning mechanisms can be very helpful. Recent advances in the simulation suite ace3p have enabled realistic multiphysics characterization of such complex accelerator systems on supercomputers. In this paper, we present the new capabilities in ace3p for large-scale 3D multiphysics modeling of superconducting cavities, in particular, a parallel eigensolver for determining mechanical resonances, a parallel harmonic response solver to calculate the response of a cavity to external vibrations, and a numerical procedure to decompose mechanical loads, such as from the Lorentz force or piezoactuators, into the corresponding mechanical modes. These capabilities have been used to do an extensive rf-mechanical analysis of dressed TESLA-type superconducting cavities. The simulation results and their implications for the operational stability of the Linac Coherent Light Source-II are discussed.
The cognitive architecture for chaining of two mental operations.
Sackur, Jérôme; Dehaene, Stanislas
2009-05-01
A simple view, which dates back to Turing, proposes that complex cognitive operations are composed of serially arranged elementary operations, each passing intermediate results to the next. However, whether and how such serial processing is achieved with a brain composed of massively parallel processors, remains an open question. Here, we study the cognitive architecture for chained operations with an elementary arithmetic algorithm: we required participants to add (or subtract) two to a digit, and then compare the result with five. In four experiments, we probed the internal implementation of this task with chronometric analysis, the cued-response method, the priming method, and a subliminal forced-choice procedure. We found evidence for an approximately sequential processing, with an important qualification: the second operation in the algorithm appears to start before completion of the first operation. Furthermore, initially the second operation takes as input the stimulus number rather than the output of the first operation. Thus, operations that should be processed serially are in fact executed partially in parallel. Furthermore, although each elementary operation can proceed subliminally, their chaining does not occur in the absence of conscious perception. Overall, the results suggest that chaining is slow, effortful, imperfect (resulting partly in parallel rather than serial execution) and dependent on conscious control.
High Performance Parallel Computational Nanotechnology
NASA Technical Reports Server (NTRS)
Saini, Subhash; Craw, James M. (Technical Monitor)
1995-01-01
At a recent press conference, NASA Administrator Dan Goldin encouraged NASA Ames Research Center to take a lead role in promoting research and development of advanced, high-performance computer technology, including nanotechnology. Manufacturers of leading-edge microprocessors currently perform large-scale simulations in the design and verification of semiconductor devices and microprocessors. Recently, the need for this intensive simulation and modeling analysis has greatly increased, due in part to the ever-increasing complexity of these devices, as well as the lessons of experiences such as the Pentium fiasco. Simulation, modeling, testing, and validation will be even more important for designing molecular computers because of the complex specification of millions of atoms, thousands of assembly steps, as well as the simulation and modeling needed to ensure reliable, robust and efficient fabrication of the molecular devices. The software for this capacity does not exist today, but it can be extrapolated from the software currently used in molecular modeling for other applications: semi-empirical methods, ab initio methods, self-consistent field methods, Hartree-Fock methods, molecular mechanics; and simulation methods for diamondoid structures. In as much as it seems clear that the application of such methods in nanotechnology will require powerful, highly powerful systems, this talk will discuss techniques and issues for performing these types of computations on parallel systems. We will describe system design issues (memory, I/O, mass storage, operating system requirements, special user interface issues, interconnects, bandwidths, and programming languages) involved in parallel methods for scalable classical, semiclassical, quantum, molecular mechanics, and continuum models; molecular nanotechnology computer-aided designs (NanoCAD) techniques; visualization using virtual reality techniques of structural models and assembly sequences; software required to control mini robotic manipulators for positional control; scalable numerical algorithms for reliability, verifications and testability. There appears no fundamental obstacle to simulating molecular compilers and molecular computers on high performance parallel computers, just as the Boeing 777 was simulated on a computer before manufacturing it.
NASA Astrophysics Data System (ADS)
Dong, Dai; Li, Xiaoning
2015-03-01
High-pressure solenoid valve with high flow rate and high speed is a key component in an underwater driving system. However, traditional single spool pilot operated valve cannot meet the demands of both high flow rate and high speed simultaneously. A new structure for a high pressure solenoid valve is needed to meet the demand of the underwater driving system. A novel parallel-spool pilot operated high-pressure solenoid valve is proposed to overcome the drawback of the current single spool design. Mathematical models of the opening process and flow rate of the valve are established. Opening response time of the valve is subdivided into 4 parts to analyze the properties of the opening response. Corresponding formulas to solve 4 parts of the response time are derived. Key factors that influence the opening response time are analyzed. According to the mathematical model of the valve, a simulation of the opening process is carried out by MATLAB. Parameters are chosen based on theoretical analysis to design the test prototype of the new type of valve. Opening response time of the designed valve is tested by verifying response of the current in the coil and displacement of the main valve spool. The experimental results are in agreement with the simulated results, therefore the validity of the theoretical analysis is verified. Experimental opening response time of the valve is 48.3 ms at working pressure of 10 MPa. The flow capacity test shows that the largest effective area is 126 mm2 and the largest air flow rate is 2320 L/s. According to the result of the load driving test, the valve can meet the demands of the driving system. The proposed valve with parallel spools provides a new method for the design of a high-pressure valve with fast response and large flow rate.
A simple computational algorithm of model-based choice preference.
Toyama, Asako; Katahira, Kentaro; Ohira, Hideki
2017-08-01
A broadly used computational framework posits that two learning systems operate in parallel during the learning of choice preferences-namely, the model-free and model-based reinforcement-learning systems. In this study, we examined another possibility, through which model-free learning is the basic system and model-based information is its modulator. Accordingly, we proposed several modified versions of a temporal-difference learning model to explain the choice-learning process. Using the two-stage decision task developed by Daw, Gershman, Seymour, Dayan, and Dolan (2011), we compared their original computational model, which assumes a parallel learning process, and our proposed models, which assume a sequential learning process. Choice data from 23 participants showed a better fit with the proposed models. More specifically, the proposed eligibility adjustment model, which assumes that the environmental model can weight the degree of the eligibility trace, can explain choices better under both model-free and model-based controls and has a simpler computational algorithm than the original model. In addition, the forgetting learning model and its variation, which assume changes in the values of unchosen actions, substantially improved the fits to the data. Overall, we show that a hybrid computational model best fits the data. The parameters used in this model succeed in capturing individual tendencies with respect to both model use in learning and exploration behavior. This computational model provides novel insights into learning with interacting model-free and model-based components.
NASA Astrophysics Data System (ADS)
Lenkiewicz, Przemyslaw; Pereira, Manuela; Freire, Mário M.; Fernandes, José
2013-12-01
In this article, we propose a novel image segmentation method called the whole mesh deformation (WMD) model, which aims at addressing the problems of modern medical imaging. Such problems have raised from the combination of several factors: (1) significant growth of medical image volumes sizes due to increasing capabilities of medical acquisition devices; (2) the will to increase the complexity of image processing algorithms in order to explore new functionality; (3) change in processor development and turn towards multi processing units instead of growing bus speeds and the number of operations per second of a single processing unit. Our solution is based on the concept of deformable models and is characterized by a very effective and precise segmentation capability. The proposed WMD model uses a volumetric mesh instead of a contour or a surface to represent the segmented shapes of interest, which allows exploiting more information in the image and obtaining results in shorter times, independently of image contents. The model also offers a good ability for topology changes and allows effective parallelization of workflow, which makes it a very good choice for large datasets. We present a precise model description, followed by experiments on artificial images and real medical data.
A Simulation Testbed for Airborne Merging and Spacing
NASA Technical Reports Server (NTRS)
Santos, Michel; Manikonda, Vikram; Feinberg, Art; Lohr, Gary
2008-01-01
The key innovation in this effort is the development of a simulation testbed for airborne merging and spacing (AM&S). We focus on concepts related to airports with Super Dense Operations where new airport runway configurations (e.g. parallel runways), sequencing, merging, and spacing are some of the concepts considered. We focus on modeling and simulating a complementary airborne and ground system for AM&S to increase efficiency and capacity of these high density terminal areas. From a ground systems perspective, a scheduling decision support tool generates arrival sequences and spacing requirements that are fed to the AM&S system operating on the flight deck. We enhanced NASA's Airspace Concept Evaluation Systems (ACES) software to model and simulate AM&S concepts and algorithms.
NASA Technical Reports Server (NTRS)
Athale, R. A.; Lee, S. H.
1978-01-01
The paper describes the fabrication and operation of an optical parallel logic (OPAL) device which performs Boolean algebraic operations on binary images. Several logic operations on two input binary images were demonstrated using an 8 x 8 device with a CdS photoconductor and a twisted nematic liquid crystal. Two such OPAL devices can be interconnected to form a half-adder circuit which is one of the essential components of a CPU in a digital signal processor.
20 kHz main inverter unit. [for space station power supplies
NASA Technical Reports Server (NTRS)
Hussey, S.
1989-01-01
A proof-of-concept main inverter unit has demonstrated the operation of a pulse-width-modulated parallel resonant power stage topology as a 20-kHz ac power source driver, showing simple output regulation, parallel operation, power sharing and short-circuit operation. The use of a two-stage dc input filter controls the electromagnetic compatibility (EMC) characteristics of the dc power bus, and the use of an ac harmonic trap controls the EMC characteristics of the 20-kHz ac power bus.
The Brain's Router: A Cortical Network Model of Serial Processing in the Primate Brain
Zylberberg, Ariel; Fernández Slezak, Diego; Roelfsema, Pieter R.; Dehaene, Stanislas; Sigman, Mariano
2010-01-01
The human brain efficiently solves certain operations such as object recognition and categorization through a massively parallel network of dedicated processors. However, human cognition also relies on the ability to perform an arbitrarily large set of tasks by flexibly recombining different processors into a novel chain. This flexibility comes at the cost of a severe slowing down and a seriality of operations (100–500 ms per step). A limit on parallel processing is demonstrated in experimental setups such as the psychological refractory period (PRP) and the attentional blink (AB) in which the processing of an element either significantly delays (PRP) or impedes conscious access (AB) of a second, rapidly presented element. Here we present a spiking-neuron implementation of a cognitive architecture where a large number of local parallel processors assemble together to produce goal-driven behavior. The precise mapping of incoming sensory stimuli onto motor representations relies on a “router” network capable of flexibly interconnecting processors and rapidly changing its configuration from one task to another. Simulations show that, when presented with dual-task stimuli, the network exhibits parallel processing at peripheral sensory levels, a memory buffer capable of keeping the result of sensory processing on hold, and a slow serial performance at the router stage, resulting in a performance bottleneck. The network captures the detailed dynamics of human behavior during dual-task-performance, including both mean RTs and RT distributions, and establishes concrete predictions on neuronal dynamics during dual-task experiments in humans and non-human primates. PMID:20442869
NASA Astrophysics Data System (ADS)
Vnukov, A. A.; Shershnev, M. B.
2018-01-01
The aim of this work is the software implementation of three image scaling algorithms using parallel computations, as well as the development of an application with a graphical user interface for the Windows operating system to demonstrate the operation of algorithms and to study the relationship between system performance, algorithm execution time and the degree of parallelization of computations. Three methods of interpolation were studied, formalized and adapted to scale images. The result of the work is a program for scaling images by different methods. Comparison of the quality of scaling by different methods is given.
Numerical modeling for the retrofit of the hydraulic cooling subsystems in operating power plant
NASA Astrophysics Data System (ADS)
AlSaqoor, S.; Alahmer, A.; Al Quran, F.; Andruszkiewicz, A.; Kubas, K.; Regucki, P.; Wędrychowicz, W.
2017-08-01
This paper presents the possibility of using the numerical methods to analyze the work of hydraulic systems on the example of a cooling system of a power boiler auxiliary devices. The variety of conditions at which hydraulic system that operated in specific engineering subsystems requires an individualized approach to the model solutions that have been developed for these systems modernizing. A mathematical model of a series-parallel propagation for the cooling water was derived and iterative methods were used to solve the system of nonlinear equations. The results of numerical calculations made it possible to analyze different variants of a modernization of the studied system and to indicate its critical elements. An economic analysis of different options allows an investor to choose an optimal variant of a reconstruction of the installation.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Krishnamurthy, Dheepak
This paper is an overview of Power System Simulation Toolbox (psst). psst is an open-source Python application for the simulation and analysis of power system models. psst simulates the wholesale market operation by solving a DC Optimal Power Flow (DCOPF), Security Constrained Unit Commitment (SCUC) and a Security Constrained Economic Dispatch (SCED). psst also includes models for the various entities in a power system such as Generator Companies (GenCos), Load Serving Entities (LSEs) and an Independent System Operator (ISO). psst features an open modular object oriented architecture that will make it useful for researchers to customize, expand, experiment beyond solvingmore » traditional problems. psst also includes a web based Graphical User Interface (GUI) that allows for user friendly interaction and for implementation on remote High Performance Computing (HPCs) clusters for parallelized operations. This paper also provides an illustrative application of psst and benchmarks with standard IEEE test cases to show the advanced features and the performance of toolbox.« less
NASA Astrophysics Data System (ADS)
Bonavita, M.; Torrisi, L.
2005-03-01
A new data assimilation system has been designed and implemented at the National Center for Aeronautic Meteorology and Climatology of the Italian Air Force (CNMCA) in order to improve its operational numerical weather prediction capabilities and provide more accurate guidance to operational forecasters. The system, which is undergoing testing before operational use, is based on an “observation space” version of the 3D-VAR method for the objective analysis component, and on the High Resolution Regional Model (HRM) of the Deutscher Wetterdienst (DWD) for the prognostic component. Notable features of the system include a completely parallel (MPI+OMP) implementation of the solution of analysis equations by a preconditioned conjugate gradient descent method; correlation functions in spherical geometry with thermal wind constraint between mass and wind field; derivation of the objective analysis parameters from a statistical analysis of the innovation increments.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Staley, Martin
2017-09-20
This high-performance ray tracing library provides very fast rendering; compact code; type flexibility through C++ "generic programming" techniques; and ease of use via an application programming interface (API) that operates independently of any GUI, on-screen display, or other enclosing application. Kip supports constructive solid geometry (CSG) models based on a wide variety of built-in shapes and logical operators, and also allows for user-defined shapes and operators to be provided. Additional features include basic texturing; input/output of models using a simple human-readable file format and with full error checking and detailed diagnostics; and support for shared data parallelism. Kip is writtenmore » in pure, ANSI standard C++; is entirely platform independent; and is very easy to use. As a C++ "header only" library, it requires no build system, configuration or installation scripts, wizards, non-C++ preprocessing, makefiles, shell scripts, or external libraries.« less
NASA Astrophysics Data System (ADS)
Johnson, Tony; Metcalfe, Jason; Brewster, Benjamin; Manteuffel, Christopher; Jaswa, Matthew; Tierney, Terrance
2010-04-01
The proliferation of intelligent systems in today's military demands increased focus on the optimization of human-robot interactions. Traditional studies in this domain involve large-scale field tests that require humans to operate semiautomated systems under varying conditions within military-relevant scenarios. However, provided that adequate constraints are employed, modeling and simulation can be a cost-effective alternative and supplement. The current presentation discusses a simulation effort that was executed in parallel with a field test with Soldiers operating military vehicles in an environment that represented key elements of the true operational context. In this study, "constructive" human operators were designed to represent average Soldiers executing supervisory control over an intelligent ground system. The constructive Soldiers were simulated performing the same tasks as those performed by real Soldiers during a directly analogous field test. Exercising the models in a high-fidelity virtual environment provided predictive results that represented actual performance in certain aspects, such as situational awareness, but diverged in others. These findings largely reflected the quality of modeling assumptions used to design behaviors and the quality of information available on which to articulate principles of operation. Ultimately, predictive analyses partially supported expectations, with deficiencies explicable via Soldier surveys, experimenter observations, and previously-identified knowledge gaps.
Parallel Work of CO2 Ejectors Installed in a Multi-Ejector Module of Refrigeration System
NASA Astrophysics Data System (ADS)
Bodys, Jakub; Palacz, Michal; Haida, Michal; Smolka, Jacek; Nowak, Andrzej J.; Banasiak, Krzysztof; Hafner, Armin
2016-09-01
A performance analysis on of fixed ejectors installed in a multi-ejector module in a CO2 refrigeration system is presented in this study. The serial and the parallel work of four fixed-geometry units that compose the multi-ejector pack was carried out. The executed numerical simulations were performed with the use of validated Homogeneous Equilibrium Model (HEM). The computational tool ejectorPL for typical transcritical parameters at the motive nozzle were used in all the tests. A wide range of the operating conditions for supermarket applications in three different European climate zones were taken into consideration. The obtained results present the high and stable performance of all the ejectors in the multi-ejector pack.
LSPRAY-V: A Lagrangian Spray Module
NASA Technical Reports Server (NTRS)
Raju, M. S.
2015-01-01
LSPRAY-V is a Lagrangian spray solver developed for application with unstructured grids and massively parallel computers. It is mainly designed to predict the flow, thermal and transport properties of a rapidly vaporizing spray encountered over a wide range of operating conditions in modern aircraft engine development. It could easily be coupled with any existing gas-phase flow and/or Monte Carlo Probability Density Function (PDF) solvers. The manual provides the user with an understanding of various models involved in the spray formulation, its code structure and solution algorithm, and various other issues related to parallelization and its coupling with other solvers. With the development of LSPRAY-V, we have advanced the state-of-the-art in spray computations in several important ways.
Parallel multiphase microflows: fundamental physics, stabilization methods and applications.
Aota, Arata; Mawatari, Kazuma; Kitamori, Takehiko
2009-09-07
Parallel multiphase microflows, which can integrate unit operations in a microchip under continuous flow conditions, are discussed. Fundamental physics, stabilization methods and some applications are shown.
Sensorimotor Learning Biases Choice Behavior: A Learning Neural Field Model for Decision Making
Schöner, Gregor; Gail, Alexander
2012-01-01
According to a prominent view of sensorimotor processing in primates, selection and specification of possible actions are not sequential operations. Rather, a decision for an action emerges from competition between different movement plans, which are specified and selected in parallel. For action choices which are based on ambiguous sensory input, the frontoparietal sensorimotor areas are considered part of the common underlying neural substrate for selection and specification of action. These areas have been shown capable of encoding alternative spatial motor goals in parallel during movement planning, and show signatures of competitive value-based selection among these goals. Since the same network is also involved in learning sensorimotor associations, competitive action selection (decision making) should not only be driven by the sensory evidence and expected reward in favor of either action, but also by the subject's learning history of different sensorimotor associations. Previous computational models of competitive neural decision making used predefined associations between sensory input and corresponding motor output. Such hard-wiring does not allow modeling of how decisions are influenced by sensorimotor learning or by changing reward contingencies. We present a dynamic neural field model which learns arbitrary sensorimotor associations with a reward-driven Hebbian learning algorithm. We show that the model accurately simulates the dynamics of action selection with different reward contingencies, as observed in monkey cortical recordings, and that it correctly predicted the pattern of choice errors in a control experiment. With our adaptive model we demonstrate how network plasticity, which is required for association learning and adaptation to new reward contingencies, can influence choice behavior. The field model provides an integrated and dynamic account for the operations of sensorimotor integration, working memory and action selection required for decision making in ambiguous choice situations. PMID:23166483
Three-dimensional wideband electromagnetic modeling on massively parallel computers
NASA Astrophysics Data System (ADS)
Alumbaugh, David L.; Newman, Gregory A.; Prevost, Lydie; Shadid, John N.
1996-01-01
A method is presented for modeling the wideband, frequency domain electromagnetic (EM) response of a three-dimensional (3-D) earth to dipole sources operating at frequencies where EM diffusion dominates the response (less than 100 kHz) up into the range where propagation dominates (greater than 10 MHz). The scheme employs the modified form of the vector Helmholtz equation for the scattered electric fields to model variations in electrical conductivity, dielectric permitivity and magnetic permeability. The use of the modified form of the Helmholtz equation allows for perfectly matched layer ( PML) absorbing boundary conditions to be employed through the use of complex grid stretching. Applying the finite difference operator to the modified Helmholtz equation produces a linear system of equations for which the matrix is sparse and complex symmetrical. The solution is obtained using either the biconjugate gradient (BICG) or quasi-minimum residual (QMR) methods with preconditioning; in general we employ the QMR method with Jacobi scaling preconditioning due to stability. In order to simulate larger, more realistic models than has been previously possible, the scheme has been modified to run on massively parallel (MP) computer architectures. Execution on the 1840-processor Intel Paragon has indicated a maximum model size of 280 × 260 × 200 cells with a maximum flop rate of 14.7 Gflops. Three different geologic models are simulated to demonstrate the use of the code for frequencies ranging from 100 Hz to 30 MHz and for different source types and polarizations. The simulations show that the scheme is correctly able to model the air-earth interface and the jump in the electric and magnetic fields normal to discontinuities. For frequencies greater than 10 MHz, complex grid stretching must be employed to incorporate absorbing boundaries while below this normal (real) grid stretching can be employed.
NASA Astrophysics Data System (ADS)
Jasim, S. E.; Jusoh, M. A.; Mahmud, S. N. S.; Zamani, A. H.
2018-04-01
Development of low losses, small size and broad bandwidth microwave bandpass filter operating at higher frequencies is an active area of research. This paper presents a new route used to design and simulate microwave bandpass filter using finite element modelling and realized broad bandwidth, low losses, small dimension microwave bandpass filter operating at 10 GHz frequency using return loss method. The filter circuit has been carried out using Computer Aid Design (CAD), Ansoft HFSS software and designed with four parallel couple line model and small dimension (10 × 10 mm2) using LaAlO3 substrate. The response of the microwave filter circuit showed high return loss -50 dB at operating frequency at 10.4 GHz and broad bandwidth of 2.5 GHz from 9.5 to 12 GHz. The results indicate the filter design and simulation using HFSS is reliable and have the opportunity to transfer from lab potential experiments to the industry.
Pernar, Luise I M; Ashley, Stanley W; Smink, Douglas S; Zinner, Michael J; Peyre, Sarah E
2012-01-01
Practicing within the Halstedian model of surgical education, academic surgeons serve dual roles as physicians to their patients and educators of their trainees. Despite this significant responsibility, few surgeons receive formal training in educational theory to inform their practice. The goal of this work was to gain an understanding of how master surgeons approach teaching uncommon and highly complex operations and to determine the educational constructs that frame their teaching philosophies and approaches. Individuals included in the study were queried using electronically distributed open-ended, structured surveys. Responses to the surveys were analyzed and grouped using grounded theory and were examined for parallels to concepts of learning theory. Academic teaching hospital. Twenty-two individuals identified as master surgeons. Twenty-one (95.5%) individuals responded to the survey. Two primary thematic clusters were identified: global approach to teaching (90.5% of respondents) and approach to intraoperative teaching (76.2%). Many of the emergent themes paralleled principles of transfer learning theory outlined in the psychology and education literature. Key elements included: conferring graduated responsibility (57.1%), encouraging development of a mental set (47.6%), fostering or expecting deliberate practice (42.9%), deconstructing complex tasks (38.1%), vertical transfer of information (33.3%), and identifying general principles to structure knowledge (9.5%). Master surgeons employ many of the principles of learning theory when teaching uncommon and highly complex operations. The findings may hold significant implications for faculty development in surgical education. Copyright © 2012 Association of Program Directors in Surgery. Published by Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Korabel, Vasily; She, Jun; Huess, Vibeke; Woge Nielsen, Jacob; Murawsky, Jens; Nerger, Lars
2017-04-01
The potential of an efficient data assimilation (DA) scheme to improve model forecast skill was successfully demonstrated by many operational centres around the world. The Baltic-North Sea region is one of the most heavily monitored seas. Ferryboxes, buoys, ADCP moorings, shallow water Argo floats, and research vessels are providing more and more near-real time observations. Coastal altimetry has now providing increasing amount of high resolution sea level observations, which will be significantly expanded by the launch of SWOT satellite in next years. This will turn operational DA into a valuable tool for improving forecast quality in the region. This motivated us to focus on advancing DA for the Baltic Monitoring and Forecasting Centre (BAL MFC) in order to create a common framework for operational data assimilation in the Baltic Sea. We have implemented HBM-PDAF system based on the Parallel Data Assimilation Framework (PDAF), a highly versatile and optimised parallel suit with a choice of sequential schemes originally developed at AWI, and a hydrodynamic HIROMB-BOOS Model (HBM). At initial phase, only the satellite Sea Surface Temperature (SST) Level 3 data has been assimilated. Several related aspects are discussed, including improvements of the forecast quality for both surface and subsurface fields, the estimation of ensemble-based forecast error covariance, as well as possibilities of assimilating new types of observations, such as in-situ salinity and temperature profiles, coastal altimetry, and ice concentration.
Contributions of Hippocampus and Striatum to Memory-Guided Behavior Depend on Past Experience
2016-01-01
The hippocampal and striatal memory systems are thought to operate independently and in parallel in supporting cognitive memory and habits, respectively. Much of the evidence for this principle comes from double dissociation data, in which damage to brain structure A causes deficits in Task 1 but not Task 2, whereas damage to structure B produces the reverse pattern of effects. Typically, animals are explicitly trained in one task. Here, we investigated whether this principle continues to hold when animals concurrently learn two types of tasks. Rats were trained on a plus maze in either a spatial navigation or a cue–response task (sequential training), whereas a third set of rats acquired both (concurrent training). Subsequently, the rats underwent either sham surgery or neurotoxic lesions of the hippocampus (HPC), medial dorsal striatum (DSM), or lateral dorsal striatum (DSL), followed by retention testing. Finally, rats in the sequential training condition also acquired the novel “other” task. When rats learned one task, HPC and DSL selectively supported spatial navigation and cue response, respectively. However, when rats learned both tasks, HPC and DSL additionally supported the behavior incongruent with the processing style of the corresponding memory system. Thus, in certain conditions, the hippocampal and striatal memory systems can operate cooperatively and in synergism. DSM significantly contributed to performance regardless of task or training procedure. Experience with the cue–response task facilitated subsequent spatial learning, whereas experience with spatial navigation delayed both concurrent and subsequent response learning. These findings suggest that there are multiple operational principles that govern memory networks. SIGNIFICANCE STATEMENT Currently, we distinguish among several types of memories, each supported by a distinct neural circuit. The memory systems are thought to operate independently and in parallel. Here, we demonstrate that the hippocampus and the dorsal striatum memory systems operate independently and in parallel when rats learn one type of task at a time, but interact cooperatively and in synergism when rats concurrently learn two types of tasks. Furthermore, new learning is modulated by past experiences. These results can be explained by a model in which independent and parallel information processing that occurs in the separate memory-related neural circuits is supplemented by information transfer between the memory systems at the level of the cortex. PMID:27307234
Contributions of Hippocampus and Striatum to Memory-Guided Behavior Depend on Past Experience.
Ferbinteanu, Janina
2016-06-15
The hippocampal and striatal memory systems are thought to operate independently and in parallel in supporting cognitive memory and habits, respectively. Much of the evidence for this principle comes from double dissociation data, in which damage to brain structure A causes deficits in Task 1 but not Task 2, whereas damage to structure B produces the reverse pattern of effects. Typically, animals are explicitly trained in one task. Here, we investigated whether this principle continues to hold when animals concurrently learn two types of tasks. Rats were trained on a plus maze in either a spatial navigation or a cue-response task (sequential training), whereas a third set of rats acquired both (concurrent training). Subsequently, the rats underwent either sham surgery or neurotoxic lesions of the hippocampus (HPC), medial dorsal striatum (DSM), or lateral dorsal striatum (DSL), followed by retention testing. Finally, rats in the sequential training condition also acquired the novel "other" task. When rats learned one task, HPC and DSL selectively supported spatial navigation and cue response, respectively. However, when rats learned both tasks, HPC and DSL additionally supported the behavior incongruent with the processing style of the corresponding memory system. Thus, in certain conditions, the hippocampal and striatal memory systems can operate cooperatively and in synergism. DSM significantly contributed to performance regardless of task or training procedure. Experience with the cue-response task facilitated subsequent spatial learning, whereas experience with spatial navigation delayed both concurrent and subsequent response learning. These findings suggest that there are multiple operational principles that govern memory networks. Currently, we distinguish among several types of memories, each supported by a distinct neural circuit. The memory systems are thought to operate independently and in parallel. Here, we demonstrate that the hippocampus and the dorsal striatum memory systems operate independently and in parallel when rats learn one type of task at a time, but interact cooperatively and in synergism when rats concurrently learn two types of tasks. Furthermore, new learning is modulated by past experiences. These results can be explained by a model in which independent and parallel information processing that occurs in the separate memory-related neural circuits is supplemented by information transfer between the memory systems at the level of the cortex. Copyright © 2016 the authors 0270-6474/16/366459-12$15.00/0.
Electronic scraps--recovering of valuable materials from parallel wire cables.
de Araújo, Mishene Christie Pinheiro Bezerra; Chaves, Arthur Pinto; Espinosa, Denise Crocce Romano; Tenório, Jorge Alberto Soares
2008-11-01
Every year, the number of discarded electro-electronic products is increasing. For this reason recycling is needed, to avoid wasting non-renewable natural resources. The objective of this work is to study the recycling of materials from parallel wire cable through unit operations of mineral processing. Parallel wire cables are basically composed of polymer and copper. The following unit operations were tested: grinding, size classification, dense medium separation, electrostatic separation, scrubbing, panning, and elutriation. It was observed that the operations used obtained copper and PVC concentrates with a low degree of cross contamination. It was concluded that total liberation of the materials was accomplished after grinding to less than 3 mm, using a cage mill. Separation using panning and elutriation presented the best results in terms of recovery and cross contamination.
NASA Astrophysics Data System (ADS)
Barthélémy, S.; Ricci, S.; Morel, T.; Goutal, N.; Le Pape, E.; Zaoui, F.
2018-07-01
In the context of hydrodynamic modeling, the use of 2D models is adapted in areas where the flow is not mono-dimensional (confluence zones, flood plains). Nonetheless the lack of field data and the computational cost constraints limit the extensive use of 2D models for operational flood forecasting. Multi-dimensional coupling offers a solution with 1D models where the flow is mono-dimensional and with local 2D models where needed. This solution allows for the representation of complex processes in 2D models, while the simulated hydraulic state is significantly better than that of the full 1D model. In this study, coupling is implemented between three 1D sub-models and a local 2D model for a confluence on the Adour river (France). A Schwarz algorithm is implemented to guarantee the continuity of the variables at the 1D/2D interfaces while in situ observations are assimilated in the 1D sub-models to improve results and forecasts in operational mode as carried out by the French flood forecasting services. An implementation of the coupling and data assimilation (DA) solution with domain decomposition and task/data parallelism is proposed so that it is compatible with operational constraints. The coupling with the 2D model improves the simulated hydraulic state compared to a global 1D model, and DA improves results in 1D and 2D areas.
Multidimensional Simulation Applied to Water Resources Management
NASA Astrophysics Data System (ADS)
Camara, A. S.; Ferreira, F. C.; Loucks, D. P.; Seixas, M. J.
1990-09-01
A framework for an integrated decision aiding simulation (IDEAS) methodology using numerical, linguistic, and pictorial entities and operations is introduced. IDEAS relies upon traditional numerical formulations, logical rules to handle linguistic entities with linguistic values, and a set of pictorial operations. Pictorial entities are defined by their shape, size, color, and position. Pictorial operators include reproduction (copy of a pictorial entity), mutation (expansion, rotation, translation, change in color), fertile encounters (intersection, reunion), and sterile encounters (absorption). Interaction between numerical, linguistic, and pictorial entities is handled through logical rules or a simplified vector calculus operation. This approach is shown to be applicable to various environmental and water resources management analyses using a model to assess the impacts of an oil spill. Future developments, including IDEAS implementation on parallel processing machines, are also discussed.
Monolithic Parallel Tandem Organic Photovoltaic Cell with Transparent Carbon Nanotube Interlayer
NASA Technical Reports Server (NTRS)
Tanaka, S.; Mielczarek, K.; Ovalle-Robles, R.; Wang, B.; Hsu, D.; Zakhidov, A. A.
2009-01-01
We demonstrate an organic photovoltaic cell with a monolithic tandem structure in parallel connection. Transparent multiwalled carbon nanotube sheets are used as an interlayer anode electrode for this parallel tandem. The characteristics of front and back cells are measured independently. The short circuit current density of the parallel tandem cell is larger than the currents of each individual cell. The wavelength dependence of photocurrent for the parallel tandem cell shows the superposition spectrum of the two spectral sensitivities of the front and back cells. The monolithic three-electrode photovoltaic cell indeed operates as a parallel tandem with improved efficiency.
Application of a Scalable, Parallel, Unstructured-Grid-Based Navier-Stokes Solver
NASA Technical Reports Server (NTRS)
Parikh, Paresh
2001-01-01
A parallel version of an unstructured-grid based Navier-Stokes solver, USM3Dns, previously developed for efficient operation on a variety of parallel computers, has been enhanced to incorporate upgrades made to the serial version. The resultant parallel code has been extensively tested on a variety of problems of aerospace interest and on two sets of parallel computers to understand and document its characteristics. An innovative grid renumbering construct and use of non-blocking communication are shown to produce superlinear computing performance. Preliminary results from parallelization of a recently introduced "porous surface" boundary condition are also presented.
Forecasting of wet snow avalanche activity: Proof of concept and operational implementation
NASA Astrophysics Data System (ADS)
Gobiet, Andreas; Jöbstl, Lisa; Rieder, Hannes; Bellaire, Sascha; Mitterer, Christoph
2017-04-01
State-of-the-art tools for the operational assessment of avalanche danger include field observations, recordings from automatic weather stations, meteorological analyses and forecasts, and recently also indices derived from snowpack models. In particular, an index for identifying the onset of wet-snow avalanche cycles (LWCindex), has been demonstrated to be useful. However, its value for operational avalanche forecasting is currently limited, since detailed, physically based snowpack models are usually driven by meteorological data from automatic weather stations only and have therefore no prognostic ability. Since avalanche risk management heavily relies on timely information and early warnings, many avalanche services in Europe nowadays start issuing forecasts for the following days, instead of the traditional assessment of the current avalanche danger. In this context, the prognostic operation of detailed snowpack models has recently been objective of extensive research. In this study a new, observationally constrained setup for forecasting the onset of wet-snow avalanche cycles with the detailed snow cover model SNOWPACK is presented and evaluated. Based on data from weather stations and different numerical weather prediction models, we demonstrate that forecasts of the LWCindex as indicator for wet-snow avalanche cycles can be useful for operational warning services, but is so far not reliable enough to be used as single warning tool without considering other factors. Therefore, further development currently focuses on the improvement of the forecasts by applying ensemble techniques and suitable post processing approaches to the output of numerical weather prediction models. In parallel, the prognostic meteo-snow model chain is operationally used by two regional avalanche warning services in Austria since winter 2016/2017 for the first time. Experiences from the first operational season and first results from current model developments will be reported.
Design of a massively parallel computer using bit serial processing elements
NASA Technical Reports Server (NTRS)
Aburdene, Maurice F.; Khouri, Kamal S.; Piatt, Jason E.; Zheng, Jianqing
1995-01-01
A 1-bit serial processor designed for a parallel computer architecture is described. This processor is used to develop a massively parallel computational engine, with a single instruction-multiple data (SIMD) architecture. The computer is simulated and tested to verify its operation and to measure its performance for further development.
Optimization of atmospheric transport models on HPC platforms
NASA Astrophysics Data System (ADS)
de la Cruz, Raúl; Folch, Arnau; Farré, Pau; Cabezas, Javier; Navarro, Nacho; Cela, José María
2016-12-01
The performance and scalability of atmospheric transport models on high performance computing environments is often far from optimal for multiple reasons including, for example, sequential input and output, synchronous communications, work unbalance, memory access latency or lack of task overlapping. We investigate how different software optimizations and porting to non general-purpose hardware architectures improve code scalability and execution times considering, as an example, the FALL3D volcanic ash transport model. To this purpose, we implement the FALL3D model equations in the WARIS framework, a software designed from scratch to solve in a parallel and efficient way different geoscience problems on a wide variety of architectures. In addition, we consider further improvements in WARIS such as hybrid MPI-OMP parallelization, spatial blocking, auto-tuning and thread affinity. Considering all these aspects together, the FALL3D execution times for a realistic test case running on general-purpose cluster architectures (Intel Sandy Bridge) decrease by a factor between 7 and 40 depending on the grid resolution. Finally, we port the application to Intel Xeon Phi (MIC) and NVIDIA GPUs (CUDA) accelerator-based architectures and compare performance, cost and power consumption on all the architectures. Implications on time-constrained operational model configurations are discussed.
Perceptual and neural responses to sweet taste in humans and rodents.
Lemon, Christian H
2015-08-01
This mini-review discusses some of the parallels between rodent neurophysiological and human psychophysical data concerning temperature effects on sweet taste. "Sweet" is an innately rewarding taste sensation that is associated in part with foods that contain calories in the form of sugars. Humans and other mammals can show unconditioned preference for select sweet stimuli. Such preference is poised to influence diet selection and, in turn, nutritional status, which underscores the importance of delineating the physiological mechanisms for sweet taste with respect to their influence on human health. Advances in our knowledge of the biology of sweet taste in humans have arisen in part through studies on mechanisms of gustatory processing in rodent models. Along this line, recent work has revealed there are operational parallels in neural systems for sweet taste between mice and humans, as indexed by similarities in the effects of temperature on central neurophysiological and psychophysical responses to sucrose in these species. Such association strengthens the postulate that rodents can serve as effective models of particular mechanisms of appetitive taste processing. Data supporting this link are discussed here, as are rodent and human data that shed light on relationships between mechanisms for sweet taste and ingestive disorders, such as alcohol abuse. Rodent models have utility for understanding mechanisms of taste processing that may pertain to human flavor perception. Importantly, there are limitations to generalizing data from rodents, albeit parallels across species do exist.
PARALLEL PERTURBATION MODEL FOR CYCLE TO CYCLE VARIABILITY PPM4CCV
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ameen, Muhsin Mohammed; Som, Sibendu
This code consists of a Fortran 90 implementation of the parallel perturbation model to compute cyclic variability in spark ignition (SI) engines. Cycle-to-cycle variability (CCV) is known to be detrimental to SI engine operation resulting in partial burn and knock, and result in an overall reduction in the reliability of the engine. Numerical prediction of cycle-to-cycle variability (CCV) in SI engines is extremely challenging for two key reasons: (i) high-fidelity methods such as large eddy simulation (LES) are required to accurately capture the in-cylinder turbulent flow field, and (ii) CCV is experienced over long timescales and hence the simulations needmore » to be performed for hundreds of consecutive cycles. In the new technique, the strategy is to perform multiple parallel simulations, each of which encompasses 2-3 cycles, by effectively perturbing the simulation parameters such as the initial and boundary conditions. The PPM4CCV code is a pre-processing code and can be coupled with any engine CFD code. PPM4CCV was coupled with Converge CFD code and a 10-time speedup was demonstrated over the conventional multi-cycle LES in predicting the CCV for a motored engine. Recently, the model is also being applied to fired engines including port fuel injected (PFI) and direct injection spark ignition engines and the preliminary results are very encouraging.« less
Computer modeling of fan-exit-splitter spacing effects on F100 response to distortion
NASA Technical Reports Server (NTRS)
Shaw, M.; Murdoch, R. W.
1982-01-01
The distortion response of the F100(3) engine was effected by the fan exit splitter configuration. The sensitivity for a proximate splitter fan is calculated to be slightly greater than a remote splitter configuration with identical airfoils. Predicted response was based upon a multiple segment parallel compressor Model modified to include a bypass ratio representation that effects the performance characteristics of the last rotor and intermediate case struts. The predicted distortion response required an accurate definition of row pre- and post-stall undistorted operation.
Parallel database search and prime factorization with magnonic holographic memory devices
DOE Office of Scientific and Technical Information (OSTI.GOV)
Khitun, Alexander
In this work, we describe the capabilities of Magnonic Holographic Memory (MHM) for parallel database search and prime factorization. MHM is a type of holographic device, which utilizes spin waves for data transfer and processing. Its operation is based on the correlation between the phases and the amplitudes of the input spin waves and the output inductive voltage. The input of MHM is provided by the phased array of spin wave generating elements allowing the producing of phase patterns of an arbitrary form. The latter makes it possible to code logic states into the phases of propagating waves and exploitmore » wave superposition for parallel data processing. We present the results of numerical modeling illustrating parallel database search and prime factorization. The results of numerical simulations on the database search are in agreement with the available experimental data. The use of classical wave interference may results in a significant speedup over the conventional digital logic circuits in special task data processing (e.g., √n in database search). Potentially, magnonic holographic devices can be implemented as complementary logic units to digital processors. Physical limitations and technological constrains of the spin wave approach are also discussed.« less
Parallel database search and prime factorization with magnonic holographic memory devices
NASA Astrophysics Data System (ADS)
Khitun, Alexander
2015-12-01
In this work, we describe the capabilities of Magnonic Holographic Memory (MHM) for parallel database search and prime factorization. MHM is a type of holographic device, which utilizes spin waves for data transfer and processing. Its operation is based on the correlation between the phases and the amplitudes of the input spin waves and the output inductive voltage. The input of MHM is provided by the phased array of spin wave generating elements allowing the producing of phase patterns of an arbitrary form. The latter makes it possible to code logic states into the phases of propagating waves and exploit wave superposition for parallel data processing. We present the results of numerical modeling illustrating parallel database search and prime factorization. The results of numerical simulations on the database search are in agreement with the available experimental data. The use of classical wave interference may results in a significant speedup over the conventional digital logic circuits in special task data processing (e.g., √n in database search). Potentially, magnonic holographic devices can be implemented as complementary logic units to digital processors. Physical limitations and technological constrains of the spin wave approach are also discussed.
NASA Astrophysics Data System (ADS)
Krappel, Timo; Riedelbauch, Stefan; Jester-Zuerker, Roland; Jung, Alexander; Flurl, Benedikt; Unger, Friedeman; Galpin, Paul
2016-11-01
The operation of Francis turbines in part load conditions causes high fluctuations and dynamic loads in the turbine and especially in the draft tube. At the hub of the runner outlet a rotating vortex rope within a low pressure zone arises and propagates into the draft tube cone. The investigated part load operating point is at about 72% discharge of best efficiency. To reduce the possible influence of boundary conditions on the solution, a flow simulation of a complete Francis turbine is conducted consisting of spiral case, stay and guide vanes, runner and draft tube. As the flow has a strong swirling component for the chosen operating point, it is very challenging to accurately predict the flow and in particular the flow losses in the diffusor. The goal of this study is to reach significantly better numerical prediction of this flow type. This is achieved by an improved resolution of small turbulent structures. Therefore, the Scale Adaptive Simulation SAS-SST turbulence model - a scale resolving turbulence model - is applied and compared to the widely used RANS-SST turbulence model. The largest mesh contains 300 million elements, which achieves LES-like resolution throughout much of the computational domain. The simulations are evaluated in terms of the hydraulic losses in the machine, evaluation of the velocity field, pressure oscillations in the draft tube and visual comparisons of turbulent flow structures. A pre-release version of ANSYS CFX 17.0 is used in this paper, as this CFD solver has a parallel performance up to several thousands of cores for this application which includes a transient rotor-stator interface to support the relative motion between the runner and the stationary portions of the water turbine.
NASA Technical Reports Server (NTRS)
Campbell, R. H.; Essick, Ray B.; Johnston, Gary; Kenny, Kevin; Russo, Vince
1987-01-01
Project EOS is studying the problems of building adaptable real-time embedded operating systems for the scientific missions of NASA. Choices (A Class Hierarchical Open Interface for Custom Embedded Systems) is an operating system designed and built by Project EOS to address the following specific issues: the software architecture for adaptable embedded parallel operating systems, the achievement of high-performance and real-time operation, the simplification of interprocess communications, the isolation of operating system mechanisms from one another, and the separation of mechanisms from policy decisions. Choices is written in C++ and runs on a ten processor Encore Multimax. The system is intended for use in constructing specialized computer applications and research on advanced operating system features including fault tolerance and parallelism.
GPU-based parallel algorithm for blind image restoration using midfrequency-based methods
NASA Astrophysics Data System (ADS)
Xie, Lang; Luo, Yi-han; Bao, Qi-liang
2013-08-01
GPU-based general-purpose computing is a new branch of modern parallel computing, so the study of parallel algorithms specially designed for GPU hardware architecture is of great significance. In order to solve the problem of high computational complexity and poor real-time performance in blind image restoration, the midfrequency-based algorithm for blind image restoration was analyzed and improved in this paper. Furthermore, a midfrequency-based filtering method is also used to restore the image hardly with any recursion or iteration. Combining the algorithm with data intensiveness, data parallel computing and GPU execution model of single instruction and multiple threads, a new parallel midfrequency-based algorithm for blind image restoration is proposed in this paper, which is suitable for stream computing of GPU. In this algorithm, the GPU is utilized to accelerate the estimation of class-G point spread functions and midfrequency-based filtering. Aiming at better management of the GPU threads, the threads in a grid are scheduled according to the decomposition of the filtering data in frequency domain after the optimization of data access and the communication between the host and the device. The kernel parallelism structure is determined by the decomposition of the filtering data to ensure the transmission rate to get around the memory bandwidth limitation. The results show that, with the new algorithm, the operational speed is significantly increased and the real-time performance of image restoration is effectively improved, especially for high-resolution images.
A Joint Method of Envelope Inversion Combined with Hybrid-domain Full Waveform Inversion
NASA Astrophysics Data System (ADS)
CUI, C.; Hou, W.
2017-12-01
Full waveform inversion (FWI) aims to construct high-precision subsurface models by fully using the information in seismic records, including amplitude, travel time, phase and so on. However, high non-linearity and the absence of low frequency information in seismic data lead to the well-known cycle skipping problem and make inversion easily fall into local minima. In addition, those 3D inversion methods that are based on acoustic approximation ignore the elastic effects in real seismic field, and make inversion harder. As a result, the accuracy of final inversion results highly relies on the quality of initial model. In order to improve stability and quality of inversion results, multi-scale inversion that reconstructs subsurface model from low to high frequency are applied. But, the absence of very low frequencies (< 3Hz) in field data is still bottleneck in the FWI. By extracting ultra low-frequency data from field data, envelope inversion is able to recover low wavenumber model with a demodulation operator (envelope operator), though the low frequency data does not really exist in field data. To improve the efficiency and viability of the inversion, in this study, we proposed a joint method of envelope inversion combined with hybrid-domain FWI. First, we developed 3D elastic envelope inversion, and the misfit function and the corresponding gradient operator were derived. Then we performed hybrid-domain FWI with envelope inversion result as initial model which provides low wavenumber component of model. Here, forward modeling is implemented in the time domain and inversion in the frequency domain. To accelerate the inversion, we adopt CPU/GPU heterogeneous computing techniques. There were two levels of parallelism. In the first level, the inversion tasks are decomposed and assigned to each computation node by shot number. In the second level, GPU multithreaded programming is used for the computation tasks in each node, including forward modeling, envelope extraction, DFT (discrete Fourier transform) calculation and gradients calculation. Numerical tests demonstrated that the combined envelope inversion + hybrid-domain FWI could obtain much faithful and accurate result than conventional hybrid-domain FWI. The CPU/GPU heterogeneous parallel computation could improve the performance speed.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hornung, Richard D.; Hones, Holger E.
The RAJA Performance Suite is designed to evaluate performance of the RAJA performance portability library on a wide variety of important high performance computing (HPC) algorithmic lulmels. These kernels assess compiler optimizations and various parallel programming model backends accessible through RAJA, such as OpenMP, CUDA, etc. The Initial version of the suite contains 25 computational kernels, each of which appears in 6 variants: Baseline SequcntiaJ, RAJA SequentiaJ, Baseline OpenMP, RAJA OpenMP, Baseline CUDA, RAJA CUDA. All variants of each kernel perform essentially the same mathematical operations and the loop body code for each kernel is identical across all variants. Theremore » are a few kernels, such as those that contain reduction operations, that require CUDA-specific coding for their CUDA variants. ActuaJ computer instructions executed and how they run in parallel differs depending on the parallel programming model backend used and which optimizations are perfonned by the compiler used to build the Perfonnance Suite executable. The Suite will be used primarily by RAJA developers to perform regular assessments of RAJA performance across a range of hardware platforms and compilers as RAJA features are being developed. It will also be used by LLNL hardware and software vendor panners for new defining requirements for future computing platform procurements and acceptance testing. In particular, the RAJA Performance Suite will be used for compiler acceptance testing of the upcoming CORAUSierra machine {initial LLNL delivery expected in late-2017/early 2018) and the CORAL-2 procurement. The Suite will aJso be used to generate concise source code reproducers of compiler and runtime issues we uncover so that we may provide them to relevant vendors to be fixed.« less
GaAs Supercomputing: Architecture, Language, And Algorithms For Image Processing
NASA Astrophysics Data System (ADS)
Johl, John T.; Baker, Nick C.
1988-10-01
The application of high-speed GaAs processors in a parallel system matches the demanding computational requirements of image processing. The architecture of the McDonnell Douglas Astronautics Company (MDAC) vector processor is described along with the algorithms and language translator. Most image and signal processing algorithms can utilize parallel processing and show a significant performance improvement over sequential versions. The parallelization performed by this system is within each vector instruction. Since each vector has many elements, each requiring some computation, useful concurrent arithmetic operations can easily be performed. Balancing the memory bandwidth with the computation rate of the processors is an important design consideration for high efficiency and utilization. The architecture features a bus-based execution unit consisting of four to eight 32-bit GaAs RISC microprocessors running at a 200 MHz clock rate for a peak performance of 1.6 BOPS. The execution unit is connected to a vector memory with three buses capable of transferring two input words and one output word every 10 nsec. The address generators inside the vector memory perform different vector addressing modes and feed the data to the execution unit. The functions discussed in this paper include basic MATRIX OPERATIONS, 2-D SPATIAL CONVOLUTION, HISTOGRAM, and FFT. For each of these algorithms, assembly language programs were run on a behavioral model of the system to obtain performance figures.
Tracking moving radar targets with parallel, velocity-tuned filters
Bickel, Douglas L.; Harmony, David W.; Bielek, Timothy P.; Hollowell, Jeff A.; Murray, Margaret S.; Martinez, Ana
2013-04-30
Radar data associated with radar illumination of a movable target is processed to monitor motion of the target. A plurality of filter operations are performed in parallel on the radar data so that each filter operation produces target image information. The filter operations are defined to have respectively corresponding velocity ranges that differ from one another. The target image information produced by one of the filter operations represents the target more accurately than the target image information produced by the remainder of the filter operations when a current velocity of the target is within the velocity range associated with the one filter operation. In response to the current velocity of the target being within the velocity range associated with the one filter operation, motion of the target is tracked based on the target image information produced by the one filter operation.
Acceleration of Radiance for Lighting Simulation by Using Parallel Computing with OpenCL
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zuo, Wangda; McNeil, Andrew; Wetter, Michael
2011-09-06
We report on the acceleration of annual daylighting simulations for fenestration systems in the Radiance ray-tracing program. The algorithm was optimized to reduce both the redundant data input/output operations and the floating-point operations. To further accelerate the simulation speed, the calculation for matrix multiplications was implemented using parallel computing on a graphics processing unit. We used OpenCL, which is a cross-platform parallel programming language. Numerical experiments show that the combination of the above measures can speed up the annual daylighting simulations 101.7 times or 28.6 times when the sky vector has 146 or 2306 elements, respectively.
Parallel Grid Manipulations in Earth Science Calculations
NASA Technical Reports Server (NTRS)
Sawyer, W.; Lucchesi, R.; daSilva, A.; Takacs, L. L.
1999-01-01
The National Aeronautics and Space Administration (NASA) Data Assimilation Office (DAO) at the Goddard Space Flight Center is moving its data assimilation system to massively parallel computing platforms. This parallel implementation of GEOS DAS will be used in the DAO's normal activities, which include reanalysis of data, and operational support for flight missions. Key components of GEOS DAS, including the gridpoint-based general circulation model and a data analysis system, are currently being parallelized. The parallelization of GEOS DAS is also one of the HPCC Grand Challenge Projects. The GEOS-DAS software employs several distinct grids. Some examples are: an observation grid- an unstructured grid of points at which observed or measured physical quantities from instruments or satellites are associated- a highly-structured latitude-longitude grid of points spanning the earth at given latitude-longitude coordinates at which prognostic quantities are determined, and a computational lat-lon grid in which the pole has been moved to a different location to avoid computational instabilities. Each of these grids has a different structure and number of constituent points. In spite of that, there are numerous interactions between the grids, e.g., values on one grid must be interpolated to another, or, in other cases, grids need to be redistributed on the underlying parallel platform. The DAO has designed a parallel integrated library for grid manipulations (PILGRIM) to support the needed grid interactions with maximum efficiency. It offers a flexible interface to generate new grids, define transformations between grids and apply them. Basic communication is currently MPI, however the interfaces defined here could conceivably be implemented with other message-passing libraries, e.g., Cray SHMEM, or with shared-memory constructs. The library is written in Fortran 90. First performance results indicate that even difficult problems, such as above-mentioned pole rotation- a sparse interpolation with little data locality between the physical lat-lon grid and a pole rotated computational grid- can be solved efficiently and at the GFlop/s rates needed to solve tomorrow's high resolution earth science models. In the subsequent presentation we will discuss the design and implementation of PILGRIM as well as a number of the problems it is required to solve. Some conclusions will be drawn about the potential performance of the overall earth science models on the supercomputer platforms foreseen for these problems.
Zhao, Jing; Zong, Haili
2018-01-01
In this paper, we propose parallel and cyclic iterative algorithms for solving the multiple-set split equality common fixed-point problem of firmly quasi-nonexpansive operators. We also combine the process of cyclic and parallel iterative methods and propose two mixed iterative algorithms. Our several algorithms do not need any prior information about the operator norms. Under mild assumptions, we prove weak convergence of the proposed iterative sequences in Hilbert spaces. As applications, we obtain several iterative algorithms to solve the multiple-set split equality problem.
NASA Astrophysics Data System (ADS)
Stone, Christopher P.; Alferman, Andrew T.; Niemeyer, Kyle E.
2018-05-01
Accurate and efficient methods for solving stiff ordinary differential equations (ODEs) are a critical component of turbulent combustion simulations with finite-rate chemistry. The ODEs governing the chemical kinetics at each mesh point are decoupled by operator-splitting allowing each to be solved concurrently. An efficient ODE solver must then take into account the available thread and instruction-level parallelism of the underlying hardware, especially on many-core coprocessors, as well as the numerical efficiency. A stiff Rosenbrock and a nonstiff Runge-Kutta ODE solver are both implemented using the single instruction, multiple thread (SIMT) and single instruction, multiple data (SIMD) paradigms within OpenCL. Both methods solve multiple ODEs concurrently within the same instruction stream. The performance of these parallel implementations was measured on three chemical kinetic models of increasing size across several multicore and many-core platforms. Two separate benchmarks were conducted to clearly determine any performance advantage offered by either method. The first benchmark measured the run-time of evaluating the right-hand-side source terms in parallel and the second benchmark integrated a series of constant-pressure, homogeneous reactors using the Rosenbrock and Runge-Kutta solvers. The right-hand-side evaluations with SIMD parallelism on the host multicore Xeon CPU and many-core Xeon Phi co-processor performed approximately three times faster than the baseline multithreaded C++ code. The SIMT parallel model on the host and Phi was 13%-35% slower than the baseline while the SIMT model on the NVIDIA Kepler GPU provided approximately the same performance as the SIMD model on the Phi. The runtimes for both ODE solvers decreased significantly with the SIMD implementations on the host CPU (2.5-2.7 ×) and Xeon Phi coprocessor (4.7-4.9 ×) compared to the baseline parallel code. The SIMT implementations on the GPU ran 1.5-1.6 times faster than the baseline multithreaded CPU code; however, this was significantly slower than the SIMD versions on the host CPU or the Xeon Phi. The performance difference between the three platforms was attributed to thread divergence caused by the adaptive step-sizes within the ODE integrators. Analysis showed that the wider vector width of the GPU incurs a higher level of divergence than the narrower Sandy Bridge or Xeon Phi. The significant performance improvement provided by the SIMD parallel strategy motivates further research into more ODE solver methods that are both SIMD-friendly and computationally efficient.
ParCAT: A Parallel Climate Analysis Toolkit
NASA Astrophysics Data System (ADS)
Haugen, B.; Smith, B.; Steed, C.; Ricciuto, D. M.; Thornton, P. E.; Shipman, G.
2012-12-01
Climate science has employed increasingly complex models and simulations to analyze the past and predict the future of our climate. The size and dimensionality of climate simulation data has been growing with the complexity of the models. This growth in data is creating a widening gap between the data being produced and the tools necessary to analyze large, high dimensional data sets. With single run data sets increasing into 10's, 100's and even 1000's of gigabytes, parallel computing tools are becoming a necessity in order to analyze and compare climate simulation data. The Parallel Climate Analysis Toolkit (ParCAT) provides basic tools that efficiently use parallel computing techniques to narrow the gap between data set size and analysis tools. ParCAT was created as a collaborative effort between climate scientists and computer scientists in order to provide efficient parallel implementations of the computing tools that are of use to climate scientists. Some of the basic functionalities included in the toolkit are the ability to compute spatio-temporal means and variances, differences between two runs and histograms of the values in a data set. ParCAT is designed to facilitate the "heavy lifting" that is required for large, multidimensional data sets. The toolkit does not focus on performing the final visualizations and presentation of results but rather, reducing large data sets to smaller, more manageable summaries. The output from ParCAT is provided in commonly used file formats (NetCDF, CSV, ASCII) to allow for simple integration with other tools. The toolkit is currently implemented as a command line utility, but will likely also provide a C library for developers interested in tighter software integration. Elements of the toolkit are already being incorporated into projects such as UV-CDAT and CMDX. There is also an effort underway to implement portions of the CCSM Land Model Diagnostics package using ParCAT in conjunction with Python and gnuplot. ParCAT is implemented in C to provide efficient file IO. The file IO operations in the toolkit use the parallel-netcdf library; this enables the code to use the parallel IO capabilities of modern HPC systems. Analysis that currently requires an estimated 12+ hours with the traditional CCSM Land Model Diagnostics Package can now be performed in as little as 30 minutes on a single desktop workstation and a few minutes for relatively small jobs completed on modern HPC systems such as ORNL's Jaguar.
Analytical Assessment of Simultaneous Parallel Approach Feasibility from Total System Error
NASA Technical Reports Server (NTRS)
Madden, Michael M.
2014-01-01
In a simultaneous paired approach to closely-spaced parallel runways, a pair of aircraft flies in close proximity on parallel approach paths. The aircraft pair must maintain a longitudinal separation within a range that avoids wake encounters and, if one of the aircraft blunders, avoids collision. Wake avoidance defines the rear gate of the longitudinal separation. The lead aircraft generates a wake vortex that, with the aid of crosswinds, can travel laterally onto the path of the trail aircraft. As runway separation decreases, the wake has less distance to traverse to reach the path of the trail aircraft. The total system error of each aircraft further reduces this distance. The total system error is often modeled as a probability distribution function. Therefore, Monte-Carlo simulations are a favored tool for assessing a "safe" rear-gate. However, safety for paired approaches typically requires that a catastrophic wake encounter be a rare one-in-a-billion event during normal operation. Using a Monte-Carlo simulation to assert this event rarity with confidence requires a massive number of runs. Such large runs do not lend themselves to rapid turn-around during the early stages of investigation when the goal is to eliminate the infeasible regions of the solution space and to perform trades among the independent variables in the operational concept. One can employ statistical analysis using simplified models more efficiently to narrow the solution space and identify promising trades for more in-depth investigation using Monte-Carlo simulations. These simple, analytical models not only have to address the uncertainty of the total system error but also the uncertainty in navigation sources used to alert an abort of the procedure. This paper presents a method for integrating total system error, procedure abort rates, avionics failures, and surveillance errors into a statistical analysis that identifies the likely feasible runway separations for simultaneous paired approaches.
GASPACHO: a generic automatic solver using proximal algorithms for convex huge optimization problems
NASA Astrophysics Data System (ADS)
Goossens, Bart; Luong, Hiêp; Philips, Wilfried
2017-08-01
Many inverse problems (e.g., demosaicking, deblurring, denoising, image fusion, HDR synthesis) share various similarities: degradation operators are often modeled by a specific data fitting function while image prior knowledge (e.g., sparsity) is incorporated by additional regularization terms. In this paper, we investigate automatic algorithmic techniques for evaluating proximal operators. These algorithmic techniques also enable efficient calculation of adjoints from linear operators in a general matrix-free setting. In particular, we study the simultaneous-direction method of multipliers (SDMM) and the parallel proximal algorithm (PPXA) solvers and show that the automatically derived implementations are well suited for both single-GPU and multi-GPU processing. We demonstrate this approach for an Electron Microscopy (EM) deconvolution problem.
Dosimetry of typical transcranial magnetic stimulation devices
NASA Astrophysics Data System (ADS)
Lu, Mai; Ueno, Shoogo
2010-05-01
The therapeutic staff using transcranial magnetic stimulation (TMS) devices could be exposed to magnetic pulses. In this paper, dependence of induced currents in real human man model on different coil shapes, distance between the coil and man model as well as the rotation of the coil in space have been investigated by employing impedance method. It was found that the figure-of-eight coil has less leakage magnetic field and low current density induced in the body compared with the round coil. The TMS power supply cables play an important role in the induced current density in human body. The induced current density in TMS operator decreased as the coil rotates from parallel position to perpendicular position. Our present study shows that TMS operator should stand at least 110 cm apart from the coil.
Evaluation of fault-tolerant parallel-processor architectures over long space missions
NASA Technical Reports Server (NTRS)
Johnson, Sally C.
1989-01-01
The impact of a five year space mission environment on fault-tolerant parallel processor architectures is examined. The target application is a Strategic Defense Initiative (SDI) satellite requiring 256 parallel processors to provide the computation throughput. The reliability requirements are that the system still be operational after five years with .99 probability and that the probability of system failure during one-half hour of full operation be less than 10(-7). The fault tolerance features an architecture must possess to meet these reliability requirements are presented, many potential architectures are briefly evaluated, and one candidate architecture, the Charles Stark Draper Laboratory's Fault-Tolerant Parallel Processor (FTPP) is evaluated in detail. A methodology for designing a preliminary system configuration to meet the reliability and performance requirements of the mission is then presented and demonstrated by designing an FTPP configuration.
Fast parallel molecular algorithms for DNA-based computation: factoring integers.
Chang, Weng-Long; Guo, Minyi; Ho, Michael Shan-Hui
2005-06-01
The RSA public-key cryptosystem is an algorithm that converts input data to an unrecognizable encryption and converts the unrecognizable data back into its original decryption form. The security of the RSA public-key cryptosystem is based on the difficulty of factoring the product of two large prime numbers. This paper demonstrates to factor the product of two large prime numbers, and is a breakthrough in basic biological operations using a molecular computer. In order to achieve this, we propose three DNA-based algorithms for parallel subtractor, parallel comparator, and parallel modular arithmetic that formally verify our designed molecular solutions for factoring the product of two large prime numbers. Furthermore, this work indicates that the cryptosystems using public-key are perhaps insecure and also presents clear evidence of the ability of molecular computing to perform complicated mathematical operations.
Dispatching packets on a global combining network of a parallel computer
Almasi, Gheorghe [Ardsley, NY; Archer, Charles J [Rochester, MN
2011-07-19
Methods, apparatus, and products are disclosed for dispatching packets on a global combining network of a parallel computer comprising a plurality of nodes connected for data communications using the network capable of performing collective operations and point to point operations that include: receiving, by an origin system messaging module on an origin node from an origin application messaging module on the origin node, a storage identifier and an operation identifier, the storage identifier specifying storage containing an application message for transmission to a target node, and the operation identifier specifying a message passing operation; packetizing, by the origin system messaging module, the application message into network packets for transmission to the target node, each network packet specifying the operation identifier and an operation type for the message passing operation specified by the operation identifier; and transmitting, by the origin system messaging module, the network packets to the target node.
Towards a large-scale scalable adaptive heart model using shallow tree meshes
NASA Astrophysics Data System (ADS)
Krause, Dorian; Dickopf, Thomas; Potse, Mark; Krause, Rolf
2015-10-01
Electrophysiological heart models are sophisticated computational tools that place high demands on the computing hardware due to the high spatial resolution required to capture the steep depolarization front. To address this challenge, we present a novel adaptive scheme for resolving the deporalization front accurately using adaptivity in space. Our adaptive scheme is based on locally structured meshes. These tensor meshes in space are organized in a parallel forest of trees, which allows us to resolve complicated geometries and to realize high variations in the local mesh sizes with a minimal memory footprint in the adaptive scheme. We discuss both a non-conforming mortar element approximation and a conforming finite element space and present an efficient technique for the assembly of the respective stiffness matrices using matrix representations of the inclusion operators into the product space on the so-called shallow tree meshes. We analyzed the parallel performance and scalability for a two-dimensional ventricle slice as well as for a full large-scale heart model. Our results demonstrate that the method has good performance and high accuracy.
Program For Parallel Discrete-Event Simulation
NASA Technical Reports Server (NTRS)
Beckman, Brian C.; Blume, Leo R.; Geiselman, John S.; Presley, Matthew T.; Wedel, John J., Jr.; Bellenot, Steven F.; Diloreto, Michael; Hontalas, Philip J.; Reiher, Peter L.; Weiland, Frederick P.
1991-01-01
User does not have to add any special logic to aid in synchronization. Time Warp Operating System (TWOS) computer program is special-purpose operating system designed to support parallel discrete-event simulation. Complete implementation of Time Warp mechanism. Supports only simulations and other computations designed for virtual time. Time Warp Simulator (TWSIM) subdirectory contains sequential simulation engine interface-compatible with TWOS. TWOS and TWSIM written in, and support simulations in, C programming language.
Role of the Controller in an Integrated Pilot-Controller Study for Parallel Approaches
NASA Technical Reports Server (NTRS)
Verma, Savvy; Kozon, Thomas; Ballinger, Debbi; Lozito, Sandra; Subramanian, Shobana
2011-01-01
Closely spaced parallel runway operations have been found to increase capacity within the National Airspace System but poor visibility conditions reduce the use of these operations [1]. Previous research examined the concepts and procedures related to parallel runways [2][4][5]. However, there has been no investigation of the procedures associated with the strategic and tactical pairing of aircraft for these operations. This study developed and examined the pilot s and controller s procedures and information requirements for creating aircraft pairs for closely spaced parallel runway operations. The goal was to achieve aircraft pairing with a temporal separation of 15s (+/- 10s error) at a coupling point that was 12 nmi from the runway threshold. In this paper, the role of the controller, as examined in an integrated study of controllers and pilots, is presented. The controllers utilized a pairing scheduler and new pairing interfaces to help create and maintain aircraft pairs, in a high-fidelity, human-in-the loop simulation experiment. Results show that the controllers worked as a team to achieve pairing between aircraft and the level of inter-controller coordination increased when the aircraft in the pair belonged to different sectors. Controller feedback did not reveal over reliance on the automation nor complacency with the pairing automation or pairing procedures.
NASA Astrophysics Data System (ADS)
Bernadowski, Timothy Adam, Jr.
Carbon dioxide in the Martian atmosphere can be converted to oxygen during high temperature electrolysis for use in life-support and fuel systems on manned missions to the red planet. During electrolysis of carbon dioxide to produce oxygen, carbon can deposit on the electrolysis cell resulting in lower efficiency and possibly cell damage. This would be detrimental, especially when the oxygen product is used as the key element of a space life support system. In this thesis, a theoretical model was developed to predict hazardous carbon deposition conditions under various operating conditions within the Martian atmosphere. The model can be used as a guide to determine the ideal operating conditions of the high-temperature oxygen production system. A parallel experimental investigation is underway to evaluate the accuracy of the theoretical model. The experimental design, cell fabrication, and some preliminary results as well as future work recommendations are also presented in this thesis.
Adsorbent testing and mathematical modeling of a solid amine regenerative CO2 and H2O removal system
NASA Technical Reports Server (NTRS)
Jeng, F. F.; Williamson, R. G.; Quellette, F. A.; Edeen, M. A.; Lin, C. H.
1991-01-01
The paper examines the design and the construction details of the test bed built for testing a solid-amine-based Regenerable CO2 Removal System (RCRS) built at the NASA/Johnson Space Center for the extended Orbiter missions. The results of tests are presented, including those for the adsorption breakthrough and the adsorption and desorption of CO2 and H2O vapor. A model for predicting the performance of regenerative CO2 and H2O vapor adsorption of the solid amine system under various operating conditions was developed in parallel with the testing of the test stand, using the coefficient of mass transfer calculated from test results. The results of simulations are shown to predict the adsorption performance of the Extended Duration Orbiter test bed fairly well. For the application to the RCRS at various operating conditions the model has to be modified.
Parallelization of fine-scale computation in Agile Multiscale Modelling Methodology
NASA Astrophysics Data System (ADS)
Macioł, Piotr; Michalik, Kazimierz
2016-10-01
Nowadays, multiscale modelling of material behavior is an extensively developed area. An important obstacle against its wide application is high computational demands. Among others, the parallelization of multiscale computations is a promising solution. Heterogeneous multiscale models are good candidates for parallelization, since communication between sub-models is limited. In this paper, the possibility of parallelization of multiscale models based on Agile Multiscale Methodology framework is discussed. A sequential, FEM based macroscopic model has been combined with concurrently computed fine-scale models, employing a MatCalc thermodynamic simulator. The main issues, being investigated in this work are: (i) the speed-up of multiscale models with special focus on fine-scale computations and (ii) on decreasing the quality of computations enforced by parallel execution. Speed-up has been evaluated on the basis of Amdahl's law equations. The problem of `delay error', rising from the parallel execution of fine scale sub-models, controlled by the sequential macroscopic sub-model is discussed. Some technical aspects of combining third-party commercial modelling software with an in-house multiscale framework and a MPI library are also discussed.
QCM-D on mica for parallel QCM-D-AFM studies.
Richter, Ralf P; Brisson, Alain
2004-05-25
Quartz crystal microbalance with dissipation monitoring (QCM-D) has developed into a recognized method to study adsorption processes in liquid, such as the formation of supported lipid bilayers and protein adsorption. However, the large intrinsic roughness of currently used gold-coated or silica-coated QCM-D sensors limits parallel structural characterization by atomic force microscopy (AFM). We present a method for coating QCM-D sensors with thin mica sheets operating in liquid with high stability and sensitivity. We define criteria to objectively assess the reliability of the QCM-D measurements and demonstrate that the mica-coated sensors can be used to follow the formation of supported lipid membranes and subsequent protein adsorption. This method allows combining QCM-D and AFM investigations on identical supports, providing detailed physicochemical and structural characterization of model membranes.
Ramírez-Miquet, Evelio E.; Perchoux, Julien; Loubière, Karine; Tronche, Clément; Prat, Laurent; Sotolongo-Costa, Oscar
2016-01-01
Optical feedback interferometry (OFI) is a compact sensing technique with recent implementation for flow measurements in microchannels. We propose implementing OFI for the analysis at the microscale of multiphase flows starting with the case of parallel flows of two immiscible fluids. The velocity profiles in each phase were measured and the interface location estimated for several operating conditions. To the authors knowledge, this sensing technique is applied here for the first time to multiphase flows. Theoretical profiles issued from a model based on the Couette viscous flow approximation reproduce fairly well the experimental results. The sensing system and the analysis presented here provide a new tool for studying more complex interactions between immiscible fluids (such as liquid droplets flowing in a microchannel). PMID:27527178
NASA Astrophysics Data System (ADS)
Huang, J. D.; Liu, J. J.; Chen, Q. X.; Mao, N.
2017-06-01
Against a background of heat-treatment operations in mould manufacturing, a two-stage flow-shop scheduling problem is described for minimizing makespan with parallel batch-processing machines and re-entrant jobs. The weights and release dates of jobs are non-identical, but job processing times are equal. A mixed-integer linear programming model is developed and tested with small-scale scenarios. Given that the problem is NP hard, three heuristic construction methods with polynomial complexity are proposed. The worst case of the new constructive heuristic is analysed in detail. A method for computing lower bounds is proposed to test heuristic performance. Heuristic efficiency is tested with sets of scenarios. Compared with the two improved heuristics, the performance of the new constructive heuristic is superior.
3D multiphysics modeling of superconducting cavities with a massively parallel simulation suite
Kononenko, Oleksiy; Adolphsen, Chris; Li, Zenghai; ...
2017-10-10
Radiofrequency cavities based on superconducting technology are widely used in particle accelerators for various applications. The cavities usually have high quality factors and hence narrow bandwidths, so the field stability is sensitive to detuning from the Lorentz force and external loads, including vibrations and helium pressure variations. If not properly controlled, the detuning can result in a serious performance degradation of a superconducting accelerator, so an understanding of the underlying detuning mechanisms can be very helpful. Recent advances in the simulation suite ace3p have enabled realistic multiphysics characterization of such complex accelerator systems on supercomputers. In this paper, we presentmore » the new capabilities in ace3p for large-scale 3D multiphysics modeling of superconducting cavities, in particular, a parallel eigensolver for determining mechanical resonances, a parallel harmonic response solver to calculate the response of a cavity to external vibrations, and a numerical procedure to decompose mechanical loads, such as from the Lorentz force or piezoactuators, into the corresponding mechanical modes. These capabilities have been used to do an extensive rf-mechanical analysis of dressed TESLA-type superconducting cavities. Furthermore, the simulation results and their implications for the operational stability of the Linac Coherent Light Source-II are discussed.« less
3D multiphysics modeling of superconducting cavities with a massively parallel simulation suite
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kononenko, Oleksiy; Adolphsen, Chris; Li, Zenghai
Radiofrequency cavities based on superconducting technology are widely used in particle accelerators for various applications. The cavities usually have high quality factors and hence narrow bandwidths, so the field stability is sensitive to detuning from the Lorentz force and external loads, including vibrations and helium pressure variations. If not properly controlled, the detuning can result in a serious performance degradation of a superconducting accelerator, so an understanding of the underlying detuning mechanisms can be very helpful. Recent advances in the simulation suite ace3p have enabled realistic multiphysics characterization of such complex accelerator systems on supercomputers. In this paper, we presentmore » the new capabilities in ace3p for large-scale 3D multiphysics modeling of superconducting cavities, in particular, a parallel eigensolver for determining mechanical resonances, a parallel harmonic response solver to calculate the response of a cavity to external vibrations, and a numerical procedure to decompose mechanical loads, such as from the Lorentz force or piezoactuators, into the corresponding mechanical modes. These capabilities have been used to do an extensive rf-mechanical analysis of dressed TESLA-type superconducting cavities. Furthermore, the simulation results and their implications for the operational stability of the Linac Coherent Light Source-II are discussed.« less
NASA Astrophysics Data System (ADS)
Sahin, Gokhan; Kerimli, Genber
2018-03-01
This article presented a modeling study of effect of the depth base initiating on vertical parallel silicon solar cell's photovoltaic conversion efficiency. After the resolution of the continuity equation of excess minority carriers, we calculated the electrical parameters such as the photocurrent density, the photovoltage, series resistance and shunt resistances, diffusion capacitance, electric power, fill factor and the photovoltaic conversion efficiency. We determined the maximum electric power, the operating point of the solar cell and photovoltaic conversion efficiency according to the depth z in the base. We showed that the photocurrent density decreases with the depth z. The photovoltage decreased when the depth base increases. Series and shunt resistances were deduced from electrical model and were influenced and the applied the depth base. The capacity decreased with the depth z of the base. We had studied the influence of the variation of the depth z on the electrical parameters in the base.
Shi, Jingyuan Jolie; Smith, Sandi W
2016-01-01
This study examined the effect of moderately repeated exposure (three times) to a fear appeal message on the Extended Parallel Processing Model (EPPM) variables of threat, efficacy, and behavioral intentions for the recommended behaviors in the message, as well as the proportions of systematic and message-related thoughts generated after each message exposure. The results showed that after repeated exposure to a fear appeal message about preventing melanoma, perceived threat in terms of susceptibility and perceived efficacy in terms of response efficacy significantly increased. The behavioral intentions of all recommended behaviors did not change after repeated exposure to the message. However, after the second exposure the proportions of both systematic and all message-related thoughts (relative to total thoughts) significantly decreased while the proportion of heuristic thoughts significantly increased, and this pattern held after the third exposure. The findings demonstrated that the predictions in the EPPM are likely to be operative after three exposures to a persuasive message.
Thermal modeling of a cryogenic turbopump for space shuttle applications.
NASA Technical Reports Server (NTRS)
Knowles, P. J.
1971-01-01
Thermal modeling of a cryogenic pump and a hot-gas turbine in a turbopump assembly proposed for the Space Shuttle is described in this paper. A model, developed by identifying the heat-transfer regimes and incorporating their dependencies into a turbopump system model, included heat transfer for two-phase cryogen, hot-gas (200 R) impingement on turbine blades, gas impingement on rotating disks and parallel plate fluid flow. The ?thermal analyzer' program employed to develop this model was the TRW Systems Improved Numerical Differencing Analyzer (SINDA). This program uses finite differencing with lumped parameter representation for each node. Also discussed are model development, simulations of turbopump startup/shutdown operations, and the effects of varying turbopump parameters on the thermal performance.
Bio-Inspired Neural Model for Learning Dynamic Models
NASA Technical Reports Server (NTRS)
Duong, Tuan; Duong, Vu; Suri, Ronald
2009-01-01
A neural-network mathematical model that, relative to prior such models, places greater emphasis on some of the temporal aspects of real neural physical processes, has been proposed as a basis for massively parallel, distributed algorithms that learn dynamic models of possibly complex external processes by means of learning rules that are local in space and time. The algorithms could be made to perform such functions as recognition and prediction of words in speech and of objects depicted in video images. The approach embodied in this model is said to be "hardware-friendly" in the following sense: The algorithms would be amenable to execution by special-purpose computers implemented as very-large-scale integrated (VLSI) circuits that would operate at relatively high speeds and low power demands.
Contingency Analysis Post-Processing With Advanced Computing and Visualization
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chen, Yousu; Glaesemann, Kurt; Fitzhenry, Erin
Contingency analysis is a critical function widely used in energy management systems to assess the impact of power system component failures. Its outputs are important for power system operation for improved situational awareness, power system planning studies, and power market operations. With the increased complexity of power system modeling and simulation caused by increased energy production and demand, the penetration of renewable energy and fast deployment of smart grid devices, and the trend of operating grids closer to their capacity for better efficiency, more and more contingencies must be executed and analyzed quickly in order to ensure grid reliability andmore » accuracy for the power market. Currently, many researchers have proposed different techniques to accelerate the computational speed of contingency analysis, but not much work has been published on how to post-process the large amount of contingency outputs quickly. This paper proposes a parallel post-processing function that can analyze contingency analysis outputs faster and display them in a web-based visualization tool to help power engineers improve their work efficiency by fast information digestion. Case studies using an ESCA-60 bus system and a WECC planning system are presented to demonstrate the functionality of the parallel post-processing technique and the web-based visualization tool.« less
NASA Technical Reports Server (NTRS)
Harper, Richard E.; Babikyan, Carol A.; Butler, Bryan P.; Clasen, Robert J.; Harris, Chris H.; Lala, Jaynarayan H.; Masotto, Thomas K.; Nagle, Gail A.; Prizant, Mark J.; Treadwell, Steven
1994-01-01
The Army Avionics Research and Development Activity (AVRADA) is pursuing programs that would enable effective and efficient management of large amounts of situational data that occurs during tactical rotorcraft missions. The Computer Aided Low Altitude Night Helicopter Flight Program has identified automated Terrain Following/Terrain Avoidance, Nap of the Earth (TF/TA, NOE) operation as key enabling technology for advanced tactical rotorcraft to enhance mission survivability and mission effectiveness. The processing of critical information at low altitudes with short reaction times is life-critical and mission-critical necessitating an ultra-reliable/high throughput computing platform for dependable service for flight control, fusion of sensor data, route planning, near-field/far-field navigation, and obstacle avoidance operations. To address these needs the Army Fault Tolerant Architecture (AFTA) is being designed and developed. This computer system is based upon the Fault Tolerant Parallel Processor (FTPP) developed by Charles Stark Draper Labs (CSDL). AFTA is hard real-time, Byzantine, fault-tolerant parallel processor which is programmed in the ADA language. This document describes the results of the Detailed Design (Phase 2 and 3 of a 3-year project) of the AFTA development. This document contains detailed descriptions of the program objectives, the TF/TA NOE application requirements, architecture, hardware design, operating systems design, systems performance measurements and analytical models.
Extending Validated Human Performance Models to Explore NextGen Concepts
NASA Technical Reports Server (NTRS)
Gore, Brian Francis; Hooey, Becky Lee; Mahlstedt, Eric; Foyle, David C.
2012-01-01
To meet the expected increases in air traffic demands, NASA and FAA are researching and developing Next Generation Air Transportation System (NextGen) concepts. NextGen will require substantial increases in the data available to pilots on the flight deck (e.g., weather,wake, traffic trajectory predictions, etc.) to support more precise and closely coordinated operations (e.g., self-separation, RNAV/RNP, and closely spaced parallel operations, CSPOs). These NextGen procedures and operations, along with the pilot's roles and responsibilities, must be designed with consideration of the pilot's capabilities and limitations. Failure to do so will leave the pilots, and thus the entire aviation system, vulnerable to error. A validated Man-machine Integration and design Analysis System (MIDAS) v5 model was extended to evaluate anticipated changes to flight deck and controller roles and responsibilities in NextGen approach and Land operations. Compared to conditions when the controllers are responsible for separation on decent to land phase of flight, the output from these model predictions suggest that the flight deck response time to detect the lead aircraft blunder will decrease, pilot scans to the navigation display will increase, and workload will increase.
Acoustic 3D modeling by the method of integral equations
NASA Astrophysics Data System (ADS)
Malovichko, M.; Khokhlov, N.; Yavich, N.; Zhdanov, M.
2018-02-01
This paper presents a parallel algorithm for frequency-domain acoustic modeling by the method of integral equations (IE). The algorithm is applied to seismic simulation. The IE method reduces the size of the problem but leads to a dense system matrix. A tolerable memory consumption and numerical complexity were achieved by applying an iterative solver, accompanied by an effective matrix-vector multiplication operation, based on the fast Fourier transform (FFT). We demonstrate that, the IE system matrix is better conditioned than that of the finite-difference (FD) method, and discuss its relation to a specially preconditioned FD matrix. We considered several methods of matrix-vector multiplication for the free-space and layered host models. The developed algorithm and computer code were benchmarked against the FD time-domain solution. It was demonstrated that, the method could accurately calculate the seismic field for the models with sharp material boundaries and a point source and receiver located close to the free surface. We used OpenMP to speed up the matrix-vector multiplication, while MPI was used to speed up the solution of the system equations, and also for parallelizing across multiple sources. The practical examples and efficiency tests are presented as well.
Autoplan: A self-processing network model for an extended blocks world planning environment
NASA Technical Reports Server (NTRS)
Dautrechy, C. Lynne; Reggia, James A.; Mcfadden, Frank
1990-01-01
Self-processing network models (neural/connectionist models, marker passing/message passing networks, etc.) are currently undergoing intense investigation for a variety of information processing applications. These models are potentially very powerful in that they support a large amount of explicit parallel processing, and they cleanly integrate high level and low level information processing. However they are currently limited by a lack of understanding of how to apply them effectively in many application areas. The formulation of self-processing network methods for dynamic, reactive planning is studied. The long-term goal is to formulate robust, computationally effective information processing methods for the distributed control of semiautonomous exploration systems, e.g., the Mars Rover. The current research effort is focusing on hierarchical plan generation, execution and revision through local operations in an extended blocks world environment. This scenario involves many challenging features that would be encountered in a real planning and control environment: multiple simultaneous goals, parallel as well as sequential action execution, action sequencing determined not only by goals and their interactions but also by limited resources (e.g., three tasks, two acting agents), need to interpret unanticipated events and react appropriately through replanning, etc.
Determining collective barrier operation skew in a parallel computer
DOE Office of Scientific and Technical Information (OSTI.GOV)
Faraj, Daniel A.
2015-11-24
Determining collective barrier operation skew in a parallel computer that includes a number of compute nodes organized into an operational group includes: for each of the nodes until each node has been selected as a delayed node: selecting one of the nodes as a delayed node; entering, by each node other than the delayed node, a collective barrier operation; entering, after a delay by the delayed node, the collective barrier operation; receiving an exit signal from a root of the collective barrier operation; and measuring, for the delayed node, a barrier completion time. The barrier operation skew is calculated by:more » identifying, from the compute nodes' barrier completion times, a maximum barrier completion time and a minimum barrier completion time and calculating the barrier operation skew as the difference of the maximum and the minimum barrier completion time.« less
Labeled trees and the efficient computation of derivations
NASA Technical Reports Server (NTRS)
Grossman, Robert; Larson, Richard G.
1989-01-01
The effective parallel symbolic computation of operators under composition is discussed. Examples include differential operators under composition and vector fields under the Lie bracket. Data structures consisting of formal linear combinations of rooted labeled trees are discussed. A multiplication on rooted labeled trees is defined, thereby making the set of these data structures into an associative algebra. An algebra homomorphism is defined from the original algebra of operators into this algebra of trees. An algebra homomorphism from the algebra of trees into the algebra of differential operators is then described. The cancellation which occurs when noncommuting operators are expressed in terms of commuting ones occurs naturally when the operators are represented using this data structure. This leads to an algorithm which, for operators which are derivations, speeds up the computation exponentially in the degree of the operator. It is shown that the algebra of trees leads naturally to a parallel version of the algorithm.
Determining collective barrier operation skew in a parallel computer
DOE Office of Scientific and Technical Information (OSTI.GOV)
Faraj, Daniel A.
Determining collective barrier operation skew in a parallel computer that includes a number of compute nodes organized into an operational group includes: for each of the nodes until each node has been selected as a delayed node: selecting one of the nodes as a delayed node; entering, by each node other than the delayed node, a collective barrier operation; entering, after a delay by the delayed node, the collective barrier operation; receiving an exit signal from a root of the collective barrier operation; and measuring, for the delayed node, a barrier completion time. The barrier operation skew is calculated by:more » identifying, from the compute nodes' barrier completion times, a maximum barrier completion time and a minimum barrier completion time and calculating the barrier operation skew as the difference of the maximum and the minimum barrier completion time.« less
Differential Draining of Parallel-Fed Propellant Tanks in Morpheus and Apollo Flight
NASA Technical Reports Server (NTRS)
Hurlbert, Eric; Guardado, Hector; Hernandez, Humberto; Desai, Pooja
2015-01-01
Parallel-fed propellant tanks are an advantageous configuration for many spacecraft. Parallel-fed tanks allow the center of gravity (cg) to be maintained over the engine(s), as opposed to serial-fed propellant tanks which result in a cg shift as propellants are drained from tank one tank first opposite another. Parallel-fed tanks also allow for tank isolation if that is needed. Parallel tanks and feed systems have been used in several past vehicles including the Apollo Lunar Module. The design of the feedsystem connecting the parallel tank is critical to maintain balance in the propellant tanks. The design must account for and minimize the effect of manufacturing variations that could cause delta-p or mass flowrate differences, which would lead to propellant imbalance. Other sources of differential draining will be discussed. Fortunately, physics provides some self-correcting behaviors that tend to equalize any initial imbalance. The question concerning whether or not active control of propellant in each tank is required or can be avoided or not is also important to answer. In order to provide data on parallel-fed tanks and differential draining in flight for cryogenic propellants (as well as any other fluid), a vertical test bed (flying lander) for terrestrial use was employed. The Morpheus vertical test bed is a parallel-fed propellant tank system that uses passive design to keep the propellant tanks balanced. The system is operated in blow down. The Morpheus vehicle was instrumented with a capacitance level sensor in each propellant tank in order to measure the draining of propellants in over 34 tethered and 12 free flights. Morpheus did experience an approximately 20 lb/m imbalance in one pair of tanks. The cause of this imbalance will be discussed. This paper discusses the analysis, design, flight simulation vehicle dynamic modeling, and flight test of the Morpheus parallel-fed propellant. The Apollo LEM data is also examined in this summary report of the flight data.
Manyscale Computing for Sensor Processing in Support of Space Situational Awareness
NASA Astrophysics Data System (ADS)
Schmalz, M.; Chapman, W.; Hayden, E.; Sahni, S.; Ranka, S.
2014-09-01
Increasing image and signal data burden associated with sensor data processing in support of space situational awareness implies continuing computational throughput growth beyond the petascale regime. In addition to growing applications data burden and diversity, the breadth, diversity and scalability of high performance computing architectures and their various organizations challenge the development of a single, unifying, practicable model of parallel computation. Therefore, models for scalable parallel processing have exploited architectural and structural idiosyncrasies, yielding potential misapplications when legacy programs are ported among such architectures. In response to this challenge, we have developed a concise, efficient computational paradigm and software called Manyscale Computing to facilitate efficient mapping of annotated application codes to heterogeneous parallel architectures. Our theory, algorithms, software, and experimental results support partitioning and scheduling of application codes for envisioned parallel architectures, in terms of work atoms that are mapped (for example) to threads or thread blocks on computational hardware. Because of the rigor, completeness, conciseness, and layered design of our manyscale approach, application-to-architecture mapping is feasible and scalable for architectures at petascales, exascales, and above. Further, our methodology is simple, relying primarily on a small set of primitive mapping operations and support routines that are readily implemented on modern parallel processors such as graphics processing units (GPUs) and hybrid multi-processors (HMPs). In this paper, we overview the opportunities and challenges of manyscale computing for image and signal processing in support of space situational awareness applications. We discuss applications in terms of a layered hardware architecture (laboratory > supercomputer > rack > processor > component hierarchy). Demonstration applications include performance analysis and results in terms of execution time as well as storage, power, and energy consumption for bus-connected and/or networked architectures. The feasibility of the manyscale paradigm is demonstrated by addressing four principal challenges: (1) architectural/structural diversity, parallelism, and locality, (2) masking of I/O and memory latencies, (3) scalability of design as well as implementation, and (4) efficient representation/expression of parallel applications. Examples will demonstrate how manyscale computing helps solve these challenges efficiently on real-world computing systems.
Optimizing transformations of stencil operations for parallel cache-based architectures
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bassetti, F.; Davis, K.
This paper describes a new technique for optimizing serial and parallel stencil- and stencil-like operations for cache-based architectures. This technique takes advantage of the semantic knowledge implicity in stencil-like computations. The technique is implemented as a source-to-source program transformation; because of its specificity it could not be expected of a conventional compiler. Empirical results demonstrate a uniform factor of two speedup. The experiments clearly show the benefits of this technique to be a consequence, as intended, of the reduction in cache misses. The test codes are based on a 5-point stencil obtained by the discretization of the Poisson equation andmore » applied to a two-dimensional uniform grid using the Jacobi method as an iterative solver. Results are presented for a 1-D tiling for a single processor, and in parallel using 1-D data partition. For the parallel case both blocking and non-blocking communication are tested. The same scheme of experiments has bee n performed for the 2-D tiling case. However, for the parallel case the 2-D partitioning is not discussed here, so the parallel case handled for 2-D is 2-D tiling with 1-D data partitioning.« less
Parallel Tensor Compression for Large-Scale Scientific Data.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kolda, Tamara G.; Ballard, Grey; Austin, Woody Nathan
As parallel computing trends towards the exascale, scientific data produced by high-fidelity simulations are growing increasingly massive. For instance, a simulation on a three-dimensional spatial grid with 512 points per dimension that tracks 64 variables per grid point for 128 time steps yields 8 TB of data. By viewing the data as a dense five way tensor, we can compute a Tucker decomposition to find inherent low-dimensional multilinear structure, achieving compression ratios of up to 10000 on real-world data sets with negligible loss in accuracy. So that we can operate on such massive data, we present the first-ever distributed memorymore » parallel implementation for the Tucker decomposition, whose key computations correspond to parallel linear algebra operations, albeit with nonstandard data layouts. Our approach specifies a data distribution for tensors that avoids any tensor data redistribution, either locally or in parallel. We provide accompanying analysis of the computation and communication costs of the algorithms. To demonstrate the compression and accuracy of the method, we apply our approach to real-world data sets from combustion science simulations. We also provide detailed performance results, including parallel performance in both weak and strong scaling experiments.« less
NASA Technical Reports Server (NTRS)
Waheed, Abdul; Yan, Jerry
1998-01-01
This paper presents a model to evaluate the performance and overhead of parallelizing sequential code using compiler directives for multiprocessing on distributed shared memory (DSM) systems. With increasing popularity of shared address space architectures, it is essential to understand their performance impact on programs that benefit from shared memory multiprocessing. We present a simple model to characterize the performance of programs that are parallelized using compiler directives for shared memory multiprocessing. We parallelized the sequential implementation of NAS benchmarks using native Fortran77 compiler directives for an Origin2000, which is a DSM system based on a cache-coherent Non Uniform Memory Access (ccNUMA) architecture. We report measurement based performance of these parallelized benchmarks from four perspectives: efficacy of parallelization process; scalability; parallelization overhead; and comparison with hand-parallelized and -optimized version of the same benchmarks. Our results indicate that sequential programs can conveniently be parallelized for DSM systems using compiler directives but realizing performance gains as predicted by the performance model depends primarily on minimizing architecture-specific data locality overhead.
Parallel computations and control of adaptive structures
NASA Technical Reports Server (NTRS)
Park, K. C.; Alvin, Kenneth F.; Belvin, W. Keith; Chong, K. P. (Editor); Liu, S. C. (Editor); Li, J. C. (Editor)
1991-01-01
The equations of motion for structures with adaptive elements for vibration control are presented for parallel computations to be used as a software package for real-time control of flexible space structures. A brief introduction of the state-of-the-art parallel computational capability is also presented. Time marching strategies are developed for an effective use of massive parallel mapping, partitioning, and the necessary arithmetic operations. An example is offered for the simulation of control-structure interaction on a parallel computer and the impact of the approach presented for applications in other disciplines than aerospace industry is assessed.
Real-time computing platform for spiking neurons (RT-spike).
Ros, Eduardo; Ortigosa, Eva M; Agís, Rodrigo; Carrillo, Richard; Arnold, Michael
2006-07-01
A computing platform is described for simulating arbitrary networks of spiking neurons in real time. A hybrid computing scheme is adopted that uses both software and hardware components to manage the tradeoff between flexibility and computational power; the neuron model is implemented in hardware and the network model and the learning are implemented in software. The incremental transition of the software components into hardware is supported. We focus on a spike response model (SRM) for a neuron where the synapses are modeled as input-driven conductances. The temporal dynamics of the synaptic integration process are modeled with a synaptic time constant that results in a gradual injection of charge. This type of model is computationally expensive and is not easily amenable to existing software-based event-driven approaches. As an alternative we have designed an efficient time-based computing architecture in hardware, where the different stages of the neuron model are processed in parallel. Further improvements occur by computing multiple neurons in parallel using multiple processing units. This design is tested using reconfigurable hardware and its scalability and performance evaluated. Our overall goal is to investigate biologically realistic models for the real-time control of robots operating within closed action-perception loops, and so we evaluate the performance of the system on simulating a model of the cerebellum where the emulation of the temporal dynamics of the synaptic integration process is important.
SciSpark: In-Memory Map-Reduce for Earth Science Algorithms
NASA Astrophysics Data System (ADS)
Ramirez, P.; Wilson, B. D.; Whitehall, K. D.; Palamuttam, R. S.; Mattmann, C. A.; Shah, S.; Goodman, A.; Burke, W.
2016-12-01
We are developing a lightning fast Big Data technology called SciSpark based on ApacheTM Spark under a NASA AIST grant (PI Mattmann). Spark implements the map-reduce paradigm for parallel computing on a cluster, but emphasizes in-memory computation, "spilling" to disk only as needed, and so outperforms the disk-based Apache Hadoop by 100x in memory and by 10x on disk. SciSpark extends Spark to support Earth Science use in three ways: Efficient ingest of N-dimensional geo-located arrays (physical variables) from netCDF3/4, HDF4/5, and/or OPeNDAP URLS; Array operations for dense arrays in scala and Java using the ND4S/ND4J or Breeze libraries; Operations to "split" datasets across a Spark cluster by time or space or both. For example, a decade-long time-series of geo-variables can be split across time to enable parallel "speedups" of analysis by day, month, or season. Similarly, very high-resolution climate grids can be partitioned into spatial tiles for parallel operations across rows, columns, or blocks. In addition, using Spark's gateway into python, PySpark, one can utilize the entire ecosystem of numpy, scipy, etc. Finally, SciSpark Notebooks provide a modern eNotebook technology in which scala, python, or spark-sql codes are entered into cells in the Notebook and executed on the cluster, with results, plots, or graph visualizations displayed in "live widgets". We have exercised SciSpark by implementing three complex Use Cases: discovery and evolution of Mesoscale Convective Complexes (MCCs) in storms, yielding a graph of connected components; PDF Clustering of atmospheric state using parallel K-Means; and statistical "rollups" of geo-variables or model-to-obs. differences (i.e. mean, stddev, skewness, & kurtosis) by day, month, season, year, and multi-year. Geo-variables are ingested and split across the cluster using methods on the sciSparkContext object including netCDFVariables() for spatial decomposition and wholeNetCDFVariables() for time-series. The presentation will cover the architecture of SciSpark, the design of the scientific RDD (sRDD) data structures for N-dim. arrays, results from the three science Use Cases, example Notebooks, lessons learned from the algorithm implementations, and parallel performance metrics.
Linux Kernel Co-Scheduling and Bulk Synchronous Parallelism
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jones, Terry R
2012-01-01
This paper describes a kernel scheduling algorithm that is based on coscheduling principles and that is intended for parallel applications running on 1000 cores or more. Experimental results for a Linux implementation on a Cray XT5 machine are presented. The results indicate that Linux is a suitable operating system for this new scheduling scheme, and that this design provides a dramatic improvement in scaling performance for synchronizing collective operations at scale.
Partitioning Rectangular and Structurally Nonsymmetric Sparse Matrices for Parallel Processing
DOE Office of Scientific and Technical Information (OSTI.GOV)
B. Hendrickson; T.G. Kolda
1998-09-01
A common operation in scientific computing is the multiplication of a sparse, rectangular or structurally nonsymmetric matrix and a vector. In many applications the matrix- transpose-vector product is also required. This paper addresses the efficient parallelization of these operations. We show that the problem can be expressed in terms of partitioning bipartite graphs. We then introduce several algorithms for this partitioning problem and compare their performance on a set of test matrices.
Nadkarni, P M; Miller, P L
1991-01-01
A parallel program for inter-database sequence comparison was developed on the Intel Hypercube using two models of parallel programming. One version was built using machine-specific Hypercube parallel programming commands. The other version was built using Linda, a machine-independent parallel programming language. The two versions of the program provide a case study comparing these two approaches to parallelization in an important biological application area. Benchmark tests with both programs gave comparable results with a small number of processors. As the number of processors was increased, the Linda version was somewhat less efficient. The Linda version was also run without change on Network Linda, a virtual parallel machine running on a network of desktop workstations.
Collectively loading programs in a multiple program multiple data environment
DOE Office of Scientific and Technical Information (OSTI.GOV)
Aho, Michael E.; Attinella, John E.; Gooding, Thomas M.
Techniques are disclosed for loading programs efficiently in a parallel computing system. In one embodiment, nodes of the parallel computing system receive a load description file which indicates, for each program of a multiple program multiple data (MPMD) job, nodes which are to load the program. The nodes determine, using collective operations, a total number of programs to load and a number of programs to load in parallel. The nodes further generate a class route for each program to be loaded in parallel, where the class route generated for a particular program includes only those nodes on which the programmore » needs to be loaded. For each class route, a node is selected using a collective operation to be a load leader which accesses a file system to load the program associated with a class route and broadcasts the program via the class route to other nodes which require the program.« less
Anatomically constrained neural network models for the categorization of facial expression
NASA Astrophysics Data System (ADS)
McMenamin, Brenton W.; Assadi, Amir H.
2004-12-01
The ability to recognize facial expression in humans is performed with the amygdala which uses parallel processing streams to identify the expressions quickly and accurately. Additionally, it is possible that a feedback mechanism may play a role in this process as well. Implementing a model with similar parallel structure and feedback mechanisms could be used to improve current facial recognition algorithms for which varied expressions are a source for error. An anatomically constrained artificial neural-network model was created that uses this parallel processing architecture and feedback to categorize facial expressions. The presence of a feedback mechanism was not found to significantly improve performance for models with parallel architecture. However the use of parallel processing streams significantly improved accuracy over a similar network that did not have parallel architecture. Further investigation is necessary to determine the benefits of using parallel streams and feedback mechanisms in more advanced object recognition tasks.
Anatomically constrained neural network models for the categorization of facial expression
NASA Astrophysics Data System (ADS)
McMenamin, Brenton W.; Assadi, Amir H.
2005-01-01
The ability to recognize facial expression in humans is performed with the amygdala which uses parallel processing streams to identify the expressions quickly and accurately. Additionally, it is possible that a feedback mechanism may play a role in this process as well. Implementing a model with similar parallel structure and feedback mechanisms could be used to improve current facial recognition algorithms for which varied expressions are a source for error. An anatomically constrained artificial neural-network model was created that uses this parallel processing architecture and feedback to categorize facial expressions. The presence of a feedback mechanism was not found to significantly improve performance for models with parallel architecture. However the use of parallel processing streams significantly improved accuracy over a similar network that did not have parallel architecture. Further investigation is necessary to determine the benefits of using parallel streams and feedback mechanisms in more advanced object recognition tasks.
NASA Astrophysics Data System (ADS)
Hipp, J. R.; Encarnacao, A.; Ballard, S.; Young, C. J.; Phillips, W. S.; Begnaud, M. L.
2011-12-01
Recently our combined SNL-LANL research team has succeeded in developing a global, seamless 3D tomographic P-velocity model (SALSA3D) that provides superior first P travel time predictions at both regional and teleseismic distances. However, given the variable data quality and uneven data sampling associated with this type of model, it is essential that there be a means to calculate high-quality estimates of the path-dependent variance and covariance associated with the predicted travel times of ray paths through the model. In this paper, we show a methodology for accomplishing this by exploiting the full model covariance matrix. Our model has on the order of 1/2 million nodes, so the challenge in calculating the covariance matrix is formidable: 0.9 TB storage for 1/2 of a symmetric matrix, necessitating an Out-Of-Core (OOC) blocked matrix solution technique. With our approach the tomography matrix (G which includes Tikhonov regularization terms) is multiplied by its transpose (GTG) and written in a blocked sub-matrix fashion. We employ a distributed parallel solution paradigm that solves for (GTG)-1 by assigning blocks to individual processing nodes for matrix decomposition update and scaling operations. We first find the Cholesky decomposition of GTG which is subsequently inverted. Next, we employ OOC matrix multiply methods to calculate the model covariance matrix from (GTG)-1 and an assumed data covariance matrix. Given the model covariance matrix we solve for the travel-time covariance associated with arbitrary ray-paths by integrating the model covariance along both ray paths. Setting the paths equal gives variance for that path. Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy's National Nuclear Security Administration under contract DE-AC04-94AL85000.
A Robust and Scalable Software Library for Parallel Adaptive Refinement on Unstructured Meshes
NASA Technical Reports Server (NTRS)
Lou, John Z.; Norton, Charles D.; Cwik, Thomas A.
1999-01-01
The design and implementation of Pyramid, a software library for performing parallel adaptive mesh refinement (PAMR) on unstructured meshes, is described. This software library can be easily used in a variety of unstructured parallel computational applications, including parallel finite element, parallel finite volume, and parallel visualization applications using triangular or tetrahedral meshes. The library contains a suite of well-designed and efficiently implemented modules that perform operations in a typical PAMR process. Among these are mesh quality control during successive parallel adaptive refinement (typically guided by a local-error estimator), parallel load-balancing, and parallel mesh partitioning using the ParMeTiS partitioner. The Pyramid library is implemented in Fortran 90 with an interface to the Message-Passing Interface (MPI) library, supporting code efficiency, modularity, and portability. An EM waveguide filter application, adaptively refined using the Pyramid library, is illustrated.
Serial vs. parallel models of attention in visual search: accounting for benchmark RT-distributions.
Moran, Rani; Zehetleitner, Michael; Liesefeld, Heinrich René; Müller, Hermann J; Usher, Marius
2016-10-01
Visual search is central to the investigation of selective visual attention. Classical theories propose that items are identified by serially deploying focal attention to their locations. While this accounts for set-size effects over a continuum of task difficulties, it has been suggested that parallel models can account for such effects equally well. We compared the serial Competitive Guided Search model with a parallel model in their ability to account for RT distributions and error rates from a large visual search data-set featuring three classical search tasks: 1) a spatial configuration search (2 vs. 5); 2) a feature-conjunction search; and 3) a unique feature search (Wolfe, Palmer & Horowitz Vision Research, 50(14), 1304-1311, 2010). In the parallel model, each item is represented by a diffusion to two boundaries (target-present/absent); the search corresponds to a parallel race between these diffusors. The parallel model was highly flexible in that it allowed both for a parametric range of capacity-limitation and for set-size adjustments of identification boundaries. Furthermore, a quit unit allowed for a continuum of search-quitting policies when the target is not found, with "single-item inspection" and exhaustive searches comprising its extremes. The serial model was found to be superior to the parallel model, even before penalizing the parallel model for its increased complexity. We discuss the implications of the results and the need for future studies to resolve the debate.
NASA Astrophysics Data System (ADS)
Harmon, Frederick G.
2005-11-01
Parallel hybrid-electric propulsion systems would be beneficial for small unmanned aerial vehicles (UAVs) used for military, homeland security, and disaster-monitoring missions. The benefits, due to the hybrid and electric-only modes, include increased time-on-station and greater range as compared to electric-powered UAVs and stealth modes not available with gasoline-powered UAVs. This dissertation contributes to the research fields of small unmanned aerial vehicles, hybrid-electric propulsion system control, and intelligent control. A conceptual design of a small UAV with a parallel hybrid-electric propulsion system is provided. The UAV is intended for intelligence, surveillance, and reconnaissance (ISR) missions. A conceptual design reveals the trade-offs that must be considered to take advantage of the hybrid-electric propulsion system. The resulting hybrid-electric propulsion system is a two-point design that includes an engine primarily sized for cruise speed and an electric motor and battery pack that are primarily sized for a slower endurance speed. The electric motor provides additional power for take-off, climbing, and acceleration and also serves as a generator during charge-sustaining operation or regeneration. The intelligent control of the hybrid-electric propulsion system is based on an instantaneous optimization algorithm that generates a hyper-plane from the nonlinear efficiency maps for the internal combustion engine, electric motor, and lithium-ion battery pack. The hyper-plane incorporates charge-depletion and charge-sustaining strategies. The optimization algorithm is flexible and allows the operator/user to assign relative importance between the use of gasoline, electricity, and recharging depending on the intended mission. A MATLAB/Simulink model was developed to test the control algorithms. The Cerebellar Model Arithmetic Computer (CMAC) associative memory neural network is applied to the control of the UAVs parallel hybrid-electric propulsion system. The CMAC neural network approximates the hyper-plane generated from the instantaneous optimization algorithm and produces torque commands for the internal combustion engine and electric motor. The CMAC neural network controller saves on the required memory as compared to a large look-up table by two orders of magnitude. The CMAC controller also prevents the need to compute a hyper-plane or complex logic every time step.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Paul, P.; Bhattacharyya, D.; Turton, R.
2012-01-01
Future integrated gasification combined cycle (IGCC) power plants with CO{sub 2} capture will face stricter operational and environmental constraints. Accurate values of relevant states/outputs/disturbances are needed to satisfy these constraints and to maximize the operational efficiency. Unfortunately, a number of these process variables cannot be measured while a number of them can be measured, but have low precision, reliability, or signal-to-noise ratio. In this work, a sensor placement (SP) algorithm is developed for optimal selection of sensor location, number, and type that can maximize the plant efficiency and result in a desired precision of the relevant measured/unmeasured states. In thismore » work, an SP algorithm is developed for an selective, dual-stage Selexol-based acid gas removal (AGR) unit for an IGCC plant with pre-combustion CO{sub 2} capture. A comprehensive nonlinear dynamic model of the AGR unit is developed in Aspen Plus Dynamics® (APD) and used to generate a linear state-space model that is used in the SP algorithm. The SP algorithm is developed with the assumption that an optimal Kalman filter will be implemented in the plant for state and disturbance estimation. The algorithm is developed assuming steady-state Kalman filtering and steady-state operation of the plant. The control system is considered to operate based on the estimated states and thereby, captures the effects of the SP algorithm on the overall plant efficiency. The optimization problem is solved by Genetic Algorithm (GA) considering both linear and nonlinear equality and inequality constraints. Due to the very large number of candidate sets available for sensor placement and because of the long time that it takes to solve the constrained optimization problem that includes more than 1000 states, solution of this problem is computationally expensive. For reducing the computation time, parallel computing is performed using the Distributed Computing Server (DCS®) and the Parallel Computing® toolbox from Mathworks®. In this presentation, we will share our experience in setting up parallel computing using GA in the MATLAB® environment and present the overall approach for achieving higher computational efficiency in this framework.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Paul, P.; Bhattacharyya, D.; Turton, R.
2012-01-01
Future integrated gasification combined cycle (IGCC) power plants with CO{sub 2} capture will face stricter operational and environmental constraints. Accurate values of relevant states/outputs/disturbances are needed to satisfy these constraints and to maximize the operational efficiency. Unfortunately, a number of these process variables cannot be measured while a number of them can be measured, but have low precision, reliability, or signal-to-noise ratio. In this work, a sensor placement (SP) algorithm is developed for optimal selection of sensor location, number, and type that can maximize the plant efficiency and result in a desired precision of the relevant measured/unmeasured states. In thismore » work, an SP algorithm is developed for an selective, dual-stage Selexol-based acid gas removal (AGR) unit for an IGCC plant with pre-combustion CO{sub 2} capture. A comprehensive nonlinear dynamic model of the AGR unit is developed in Aspen Plus Dynamics® (APD) and used to generate a linear state-space model that is used in the SP algorithm. The SP algorithm is developed with the assumption that an optimal Kalman filter will be implemented in the plant for state and disturbance estimation. The algorithm is developed assuming steady-state Kalman filtering and steady-state operation of the plant. The control system is considered to operate based on the estimated states and thereby, captures the effects of the SP algorithm on the overall plant efficiency. The optimization problem is solved by Genetic Algorithm (GA) considering both linear and nonlinear equality and inequality constraints. Due to the very large number of candidate sets available for sensor placement and because of the long time that it takes to solve the constrained optimization problem that includes more than 1000 states, solution of this problem is computationally expensive. For reducing the computation time, parallel computing is performed using the Distributed Computing Server (DCS®) and the Parallel Computing® toolbox from Mathworks®. In this presentation, we will share our experience in setting up parallel computing using GA in the MATLAB® environment and present the overall approach for achieving higher computational efficiency in this framework.« less
Parallelized CCHE2D flow model with CUDA Fortran on Graphics Process Units
USDA-ARS?s Scientific Manuscript database
This paper presents the CCHE2D implicit flow model parallelized using CUDA Fortran programming technique on Graphics Processing Units (GPUs). A parallelized implicit Alternating Direction Implicit (ADI) solver using Parallel Cyclic Reduction (PCR) algorithm on GPU is developed and tested. This solve...
Lu, Zhao; Sun, Jing; Butts, Kenneth
2016-02-03
A giant leap has been made in the past couple of decades with the introduction of kernel-based learning as a mainstay for designing effective nonlinear computational learning algorithms. In view of the geometric interpretation of conditional expectation and the ubiquity of multiscale characteristics in highly complex nonlinear dynamic systems [1]-[3], this paper presents a new orthogonal projection operator wavelet kernel, aiming at developing an efficient computational learning approach for nonlinear dynamical system identification. In the framework of multiresolution analysis, the proposed projection operator wavelet kernel can fulfill the multiscale, multidimensional learning to estimate complex dependencies. The special advantage of the projection operator wavelet kernel developed in this paper lies in the fact that it has a closed-form expression, which greatly facilitates its application in kernel learning. To the best of our knowledge, it is the first closed-form orthogonal projection wavelet kernel reported in the literature. It provides a link between grid-based wavelets and mesh-free kernel-based methods. Simulation studies for identifying the parallel models of two benchmark nonlinear dynamical systems confirm its superiority in model accuracy and sparsity.
Exploiting analytics techniques in CMS computing monitoring
NASA Astrophysics Data System (ADS)
Bonacorsi, D.; Kuznetsov, V.; Magini, N.; Repečka, A.; Vaandering, E.
2017-10-01
The CMS experiment has collected an enormous volume of metadata about its computing operations in its monitoring systems, describing its experience in operating all of the CMS workflows on all of the Worldwide LHC Computing Grid Tiers. Data mining efforts into all these information have rarely been done, but are of crucial importance for a better understanding of how CMS did successful operations, and to reach an adequate and adaptive modelling of the CMS operations, in order to allow detailed optimizations and eventually a prediction of system behaviours. These data are now streamed into the CERN Hadoop data cluster for further analysis. Specific sets of information (e.g. data on how many replicas of datasets CMS wrote on disks at WLCG Tiers, data on which datasets were primarily requested for analysis, etc) were collected on Hadoop and processed with MapReduce applications profiting of the parallelization on the Hadoop cluster. We present the implementation of new monitoring applications on Hadoop, and discuss the new possibilities in CMS computing monitoring introduced with the ability to quickly process big data sets from mulltiple sources, looking forward to a predictive modeling of the system.
Examining Parallelism of Sets of Psychometric Measures Using Latent Variable Modeling
ERIC Educational Resources Information Center
Raykov, Tenko; Patelis, Thanos; Marcoulides, George A.
2011-01-01
A latent variable modeling approach that can be used to examine whether several psychometric tests are parallel is discussed. The method consists of sequentially testing the properties of parallel measures via a corresponding relaxation of parameter constraints in a saturated model or an appropriately constructed latent variable model. The…
Robot acting on moving bodies (RAMBO): Preliminary results
NASA Technical Reports Server (NTRS)
Davis, Larry S.; Dementhon, Daniel; Bestul, Thor; Ziavras, Sotirios; Srinivasan, H. V.; Siddalingaiah, Madju; Harwood, David
1989-01-01
A robot system called RAMBO is being developed. It is equipped with a camera, which, given a sequence of simple tasks, can perform these tasks on a moving object. RAMBO is given a complete geometric model of the object. A low level vision module extracts and groups characteristic features in images of the object. The positions of the object are determined in a sequence of images, and a motion estimate of the object is obtained. This motion estimate is used to plan trajectories of the robot tool to relative locations nearby the object sufficient for achieving the tasks. More specifically, low level vision uses parallel algorithms for image enchancement by symmetric nearest neighbor filtering, edge detection by local gradient operators, and corner extraction by sector filtering. The object pose estimation is a Hough transform method accumulating position hypotheses obtained by matching triples of image features (corners) to triples of model features. To maximize computing speed, the estimate of the position in space of a triple of features is obtained by decomposing its perspective view into a product of rotations and a scaled orthographic projection. This allows the use of 2-D lookup tables at each stage of the decomposition. The position hypotheses for each possible match of model feature triples and image feature triples are calculated in parallel. Trajectory planning combines heuristic and dynamic programming techniques. Then trajectories are created using parametric cubic splines between initial and goal trajectories. All the parallel algorithms run on a Connection Machine CM-2 with 16K processors.
Wake vortex effects on parallel runway operations
DOT National Transportation Integrated Search
2003-01-06
Aircraft wake vortex behavior in ground effect between two parallel runways at Frankfurt/Main International Airport was studied. The distance and time of vortex demise were examined as a function of crosswind, aircraft type, and a measure of atmosphe...
Analysis and design of a high power, digitally-controlled spacecraft power system
NASA Technical Reports Server (NTRS)
Lee, F. C.; Cho, B. H.
1990-01-01
The progress to date on the analysis and design of a high power, digitally controlled spacecraft power system is described. Several battery discharger topologies were compared for use in the space platform application. Updated information has been provided on the battery voltage specification. Initially it was thought to be in the 30 to 40 V range. It is now specified to be 53 V to 84 V. This eliminated the tapped-boost and the current-fed auto-transformer converters from consideration. After consultations with NASA, it was decided to trade-off the following topologies: (1) boost converter; (2) multi-module, multi-phase boost converter; and (3) voltage-fed push-pull with auto-transformer. A non-linear design optimization software tool was employed to facilitate an objective comparison. Non-linear design optimization insures that the best design of each topology is compared. The results indicate that a four-module, boost converter with each module operating 90 degrees out of phase is the optimum converter for the space platform. Large-signal and small-signal models were generated for the shunt, charger, discharger, battery, and the mode controller. The models were first tested individually according to the space platform power system specifications supplied by NASA. The effect of battery voltage imbalance on parallel dischargers was investigated with respect to dc and small-signal responses. Similarly, the effects of paralleling dischargers and chargers were also investigated. A solar array and shunt model was included in these simulations. A model for the bus mode controller (power control unit) was also developed to interface the Orbital replacement Unit (ORU) model to the platform power system. Small signal models were used to generate the bus impedance plots in the various operating modes. The large signal models were integrated into a system model, and time domain simulations were performed to verify bus regulation during mode transitions. Some changes have subsequently been incorporated into the models. The changes include the use of a four module boost discharger, and a new model for the mode controller, which includes the effects of saturation. The new simulations for the boost discharger show the improvement in bus ripple that can be achieved by phase-shifted operation of each of the boost modules.
Modelling and study of active vibration control for off-road vehicle
NASA Astrophysics Data System (ADS)
Zhang, Junwei; Chen, Sizhong
2014-05-01
In view of special working characteristics and structure, engineering machineries do not have conventional suspension system typically. Consequently, operators have to endure severe vibrations which are detrimental both to their health and to the productivity of the loader. Based on displacement control, a kind of active damping method is developed for a skid-steer loader. In this paper, the whole hydraulic system for active damping method is modelled which include swash plate dynamics model, proportional valve model, piston accumulator model, pilot-operated check valve model, relief valve model, pump loss model, and cylinder model. A new road excitation model is developed for the skid-steer loader specially. The response of chassis vibration acceleration to road excitation is verified through simulation. The simulation result of passive accumulator damping is compared with measurements and the comparison shows that they are close. Based on this, parallel PID controller and track PID controller with acceleration feedback are brought into the simulation model, and the simulation results are compared with passive accumulator damping. It shows that the active damping methods with PID controllers are better in reducing chassis vibration acceleration and pitch movement. In the end, the test work for active damping method is proposed for the future work.
Design considerations for parallel graphics libraries
NASA Technical Reports Server (NTRS)
Crockett, Thomas W.
1994-01-01
Applications which run on parallel supercomputers are often characterized by massive datasets. Converting these vast collections of numbers to visual form has proven to be a powerful aid to comprehension. For a variety of reasons, it may be desirable to provide this visual feedback at runtime. One way to accomplish this is to exploit the available parallelism to perform graphics operations in place. In order to do this, we need appropriate parallel rendering algorithms and library interfaces. This paper provides a tutorial introduction to some of the issues which arise in designing parallel graphics libraries and their underlying rendering algorithms. The focus is on polygon rendering for distributed memory message-passing systems. We illustrate our discussion with examples from PGL, a parallel graphics library which has been developed on the Intel family of parallel systems.
Nadkarni, P. M.; Miller, P. L.
1991-01-01
A parallel program for inter-database sequence comparison was developed on the Intel Hypercube using two models of parallel programming. One version was built using machine-specific Hypercube parallel programming commands. The other version was built using Linda, a machine-independent parallel programming language. The two versions of the program provide a case study comparing these two approaches to parallelization in an important biological application area. Benchmark tests with both programs gave comparable results with a small number of processors. As the number of processors was increased, the Linda version was somewhat less efficient. The Linda version was also run without change on Network Linda, a virtual parallel machine running on a network of desktop workstations. PMID:1807632
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chrisochoides, N.; Sukup, F.
In this paper we present a parallel implementation of the Bowyer-Watson (BW) algorithm using the task-parallel programming model. The BW algorithm constitutes an ideal mesh refinement strategy for implementing a large class of unstructured mesh generation techniques on both sequential and parallel computers, by preventing the need for global mesh refinement. Its implementation on distributed memory multicomputes using the traditional data-parallel model has been proven very inefficient due to excessive synchronization needed among processors. In this paper we demonstrate that with the task-parallel model we can tolerate synchronization costs inherent to data-parallel methods by exploring concurrency in the processor level.more » Our preliminary performance data indicate that the task- parallel approach: (i) is almost four times faster than the existing data-parallel methods, (ii) scales linearly, and (iii) introduces minimum overheads compared to the {open_quotes}best{close_quotes} sequential implementation of the BW algorithm.« less
Terminal Area Procedures for Paired Runways
NASA Technical Reports Server (NTRS)
Lozito, Sandy
2011-01-01
Parallel Runway operations have been found to increase capacity within the National Airspace (NAS) however, poor visibility conditions reduce this capacity [1]. Much research has been conducted to examine the concepts and procedures related to parallel runways however, there has been no investigation of the procedures associated with the strategic and tactical pairing of aircraft for these operations. This study developed and examined the pilot and controller procedures and information requirements for creating aircraft pairs for parallel runway operations. The goal was to achieve aircraft pairing with a temporal separation of 15s(+/- 10s error) at a coupling point that is about 12 nmi from the runway threshold. Two variables were explored for the pilot participants: Two levels of flight deck automation (current-day flight deck automation, and a prototype future automation) as well as two flight deck displays that assisted in pilot conformance monitoring. The controllers were also provided with automation to help create and maintain aircraft pairs. Data showed that the operations in this study were found to be acceptable and safe. Workload when using the pairing procedures and tools was generally low for both controllers and pilots, and situation awareness (SA) was typically moderate to high. There were some differences based upon the display and automation conditions for the pilots. Future research should consider the refinement of the concepts and tools for pilot and controller displays and automation for parallel runway concepts.
NASA Astrophysics Data System (ADS)
Rosa, Benoit; Brient, Antoine; Samper, Serge; Hascoët, Jean-Yves
2016-12-01
Mastering the additive laser manufacturing surface is a real challenge and would allow functional surfaces to be obtained without finishing. Direct Metal Deposition (DMD) surfaces are composed by directional and chaotic textures that are directly linked to the process principles. The aim of this work is to obtain surface topographies by mastering the operating process parameters. Based on experimental investigation, the influence of operating parameters on the surface finish has been modeled. Topography parameters and multi-scale analysis have been used in order to characterize the DMD obtained surfaces. This study also proposes a methodology to characterize DMD chaotic texture through topography filtering and 3D image treatment. In parallel, a new parameter is proposed: density of particles (D p). Finally, this study proposes a regression modeling between process parameters and density of particles parameter.
Optical systolic solutions of linear algebraic equations
NASA Technical Reports Server (NTRS)
Neuman, C. P.; Casasent, D.
1984-01-01
The philosophy and data encoding possible in systolic array optical processor (SAOP) were reviewed. The multitude of linear algebraic operations achievable on this architecture is examined. These operations include such linear algebraic algorithms as: matrix-decomposition, direct and indirect solutions, implicit and explicit methods for partial differential equations, eigenvalue and eigenvector calculations, and singular value decomposition. This architecture can be utilized to realize general techniques for solving matrix linear and nonlinear algebraic equations, least mean square error solutions, FIR filters, and nested-loop algorithms for control engineering applications. The data flow and pipelining of operations, design of parallel algorithms and flexible architectures, application of these architectures to computationally intensive physical problems, error source modeling of optical processors, and matching of the computational needs of practical engineering problems to the capabilities of optical processors are emphasized.
A transient analysis of frost formation on a parallel plate evaporator
DOE Office of Scientific and Technical Information (OSTI.GOV)
Martinez-Frias, J.; Aceves, S.M.; Hernandez-Guerrero, A.
1996-12-31
This paper presents the development of a transient model for evaluating frost formation on a parallel plate evaporator for heat pump applications. The model treats the frost layer as a porous substance, and applies the equations of conservation of mass, momentum and energy to calculate the growth and densification of the frost layer. Empirical correlations for thermal conductivity and tortuosity as a function of density are incorporated from previous studies. Frost growth is calculated as a function of time, Reynolds number, longitudinal location, plate temperature, and ambient air temperature and humidity. The main assumptions are: ideal gas behavior for airmore » and water vapor, uniform frost density and thermal conductivity across the thickness of the frost layer; and quasi-steady conditions during the whole process. The mathematical model is validated by comparing the predicted values of frost thickness and frost density with results obtained in recent experimental studies. A good agreement was obtained in the comparison. The frost formation model calculates pressure drop and heat transfer resistance that result from the existence of the frost layer, and it can therefore be incorporated into a heat pump model to evaluate performance losses due to frosting as a function of weather conditions and time of operation since the last evaporator defrost.« less
A Study of Parallel Software Development with HPF and MPI for Composite Process Modeling Simulations
2011-01-01
Minnesota permanently located at the U. S. Army Research Laboratory. Report Documentation Page Form ApprovedOMB No. 0704-0188 Public reporting burden for...Directorate for Information Operations and Reports , 1215 Jefferson Davis Highway, Suite 1204, Arlington VA 22202-4302. Respondents should be aware that...display a currently valid OMB control number. 1. REPORT DATE 2011 2. REPORT TYPE 3. DATES COVERED 00-00-2011 to 00-00-2011 4. TITLE AND SUBTITLE
Role of Academic Drug Discovery in the Quest for New CNS Therapeutics.
Yokley, Brian H; Hartman, Matthew; Slusher, Barbara S
2017-03-15
There was a greater than 50% decline in central nervous system (CNS) drug discovery and development programs by major pharmaceutical companies from 2009 to 2014. This decline was paralleled by a rise in the number of university led drug discovery centers, many in the CNS area, and a growth in the number of public-private drug discovery partnerships. Diverse operating models have emerged as the academic drug discovery centers adapt to this changing ecosystem.
Heavy Duty Diesel Truck and Bus Hybrid Powertrain Study
2012-03-01
electric 22 ft. bus that offers greater range than battery-electric buses can provide. Designed to seat 22 passengers plus standees, this Ebus model...system that has both parallel and series operating modes. The relatively low volume of many truck and bus designs has inhibited the development of...that battery packs need to be designed for 50,000 lifetime energy storage cycles in a hybrid transit bus vs. just 3,600 cycles in the typical
Reliable, Memory Speed Storage for Cluster Computing Frameworks
2014-06-16
specification API that can capture computations in many of today’s popular data -parallel computing models, e.g., MapReduce and SQL. We also ported the Hadoop ...today’s big data workloads: • Immutable data : Data is immutable once written, since dominant underlying storage systems, such as HDFS [3], only support...network transfers, so reads can be data -local. • Program size vs. data size: In big data processing, the same operation is repeatedly applied on massive
Large-scale parallel lattice Boltzmann-cellular automaton model of two-dimensional dendritic growth
NASA Astrophysics Data System (ADS)
Jelinek, Bohumir; Eshraghi, Mohsen; Felicelli, Sergio; Peters, John F.
2014-03-01
An extremely scalable lattice Boltzmann (LB)-cellular automaton (CA) model for simulations of two-dimensional (2D) dendritic solidification under forced convection is presented. The model incorporates effects of phase change, solute diffusion, melt convection, and heat transport. The LB model represents the diffusion, convection, and heat transfer phenomena. The dendrite growth is driven by a difference between actual and equilibrium liquid composition at the solid-liquid interface. The CA technique is deployed to track the new interface cells. The computer program was parallelized using the Message Passing Interface (MPI) technique. Parallel scaling of the algorithm was studied and major scalability bottlenecks were identified. Efficiency loss attributable to the high memory bandwidth requirement of the algorithm was observed when using multiple cores per processor. Parallel writing of the output variables of interest was implemented in the binary Hierarchical Data Format 5 (HDF5) to improve the output performance, and to simplify visualization. Calculations were carried out in single precision arithmetic without significant loss in accuracy, resulting in 50% reduction of memory and computational time requirements. The presented solidification model shows a very good scalability up to centimeter size domains, including more than ten million of dendrites. Catalogue identifier: AEQZ_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEQZ_v1_0.html Program obtainable from: CPC Program Library, Queen’s University, Belfast, UK Licensing provisions: Standard CPC license, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 29,767 No. of bytes in distributed program, including test data, etc.: 3131,367 Distribution format: tar.gz Programming language: Fortran 90. Computer: Linux PC and clusters. Operating system: Linux. Has the code been vectorized or parallelized?: Yes. Program is parallelized using MPI. Number of processors used: 1-50,000 RAM: Memory requirements depend on the grid size Classification: 6.5, 7.7. External routines: MPI (http://www.mcs.anl.gov/research/projects/mpi/), HDF5 (http://www.hdfgroup.org/HDF5/) Nature of problem: Dendritic growth in undercooled Al-3 wt% Cu alloy melt under forced convection. Solution method: The lattice Boltzmann model solves the diffusion, convection, and heat transfer phenomena. The cellular automaton technique is deployed to track the solid/liquid interface. Restrictions: Heat transfer is calculated uncoupled from the fluid flow. Thermal diffusivity is constant. Unusual features: Novel technique, utilizing periodic duplication of a pre-grown “incubation” domain, is applied for the scaleup test. Running time: Running time varies from minutes to days depending on the domain size and number of computational cores.
Scientific Data Services -- A High-Performance I/O System with Array Semantics
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wu, Kesheng; Byna, Surendra; Rotem, Doron
2011-09-21
As high-performance computing approaches exascale, the existing I/O system design is having trouble keeping pace in both performance and scalability. We propose to address this challenge by adopting database principles and techniques in parallel I/O systems. First, we propose to adopt an array data model because many scientific applications represent their data in arrays. This strategy follows a cardinal principle from database research, which separates the logical view from the physical layout of data. This high-level data model gives the underlying implementation more freedom to optimize the physical layout and to choose the most effective way of accessing the data.more » For example, knowing that a set of write operations is working on a single multi-dimensional array makes it possible to keep the subarrays in a log structure during the write operations and reassemble them later into another physical layout as resources permit. While maintaining the high-level view, the storage system could compress the user data to reduce the physical storage requirement, collocate data records that are frequently used together, or replicate data to increase availability and fault-tolerance. Additionally, the system could generate secondary data structures such as database indexes and summary statistics. We expect the proposed Scientific Data Services approach to create a “live” storage system that dynamically adjusts to user demands and evolves with the massively parallel storage hardware.« less
Parallel task processing of very large datasets
NASA Astrophysics Data System (ADS)
Romig, Phillip Richardson, III
This research concerns the use of distributed computer technologies for the analysis and management of very large datasets. Improvements in sensor technology, an emphasis on global change research, and greater access to data warehouses all are increase the number of non-traditional users of remotely sensed data. We present a framework for distributed solutions to the challenges of datasets which exceed the online storage capacity of individual workstations. This framework, called parallel task processing (PTP), incorporates both the task- and data-level parallelism exemplified by many image processing operations. An implementation based on the principles of PTP, called Tricky, is also presented. Additionally, we describe the challenges and practical issues in modeling the performance of parallel task processing with large datasets. We present a mechanism for estimating the running time of each unit of work within a system and an algorithm that uses these estimates to simulate the execution environment and produce estimated runtimes. Finally, we describe and discuss experimental results which validate the design. Specifically, the system (a) is able to perform computation on datasets which exceed the capacity of any one disk, (b) provides reduction of overall computation time as a result of the task distribution even with the additional cost of data transfer and management, and (c) in the simulation mode accurately predicts the performance of the real execution environment.
Solving Coupled Gross--Pitaevskii Equations on a Cluster of PlayStation 3 Computers
NASA Astrophysics Data System (ADS)
Edwards, Mark; Heward, Jeffrey; Clark, C. W.
2009-05-01
At Georgia Southern University we have constructed an 8+1--node cluster of Sony PlayStation 3 (PS3) computers with the intention of using this computing resource to solve problems related to the behavior of ultra--cold atoms in general with a particular emphasis on studying bose--bose and bose--fermi mixtures confined in optical lattices. As a first project that uses this computing resource, we have implemented a parallel solver of the coupled time--dependent, one--dimensional Gross--Pitaevskii (TDGP) equations. These equations govern the behavior of dual-- species bosonic mixtures. We chose the split--operator/FFT to solve the coupled 1D TDGP equations. The fast Fourier transform component of this solver can be readily parallelized on the PS3 cpu known as the Cell Broadband Engine (CellBE). Each CellBE chip contains a single 64--bit PowerPC Processor Element known as the PPE and eight ``Synergistic Processor Element'' identified as the SPE's. We report on this algorithm and compare its performance to a non--parallel solver as applied to modeling evaporative cooling in dual--species bosonic mixtures.
NASA Astrophysics Data System (ADS)
Wang, Hao-Yu; Wu, Jhao-Ting; Chow, Chi-Wai; Liu, Yang; Yeh, Chien-Hung; Liao, Xin-Lan; Lin, Kun-Hsien; Wu, Wei-Liang; Chen, Yi-Yuan
2018-01-01
Using solar cell (or photovoltaic cell) for visible light communication (VLC) is attractive. Apart from acting as a VLC receiver (Rx), the solar cell can provide energy harvesting. This can be used in self-powered smart devices, particularly in the emerging ;Internet of Things (IoT); networks. Here, we propose and demonstrate for the first time using pre-distortion pulse-amplitude-modulation (PAM)-4 signal and parallel resistance circuit to enhance the transmission performance of solar cell Rx based VLC. Pre-distortion is a simple non-adaptive equalization technique that can significantly mitigate the slow charging and discharging of the solar cell. The equivalent circuit model of the solar cell and the operation of using parallel resistance to increase the bandwidth of the solar cell are discussed. By using the proposed schemes, the experimental results show that the data rate of the solar cell Rx based VLC can increase from 20 kbit/s to 1.25 Mbit/s (about 60 times) with the bit error-rate (BER) satisfying the 7% forward error correction (FEC) limit.
Message Passing and Shared Address Space Parallelism on an SMP Cluster
NASA Technical Reports Server (NTRS)
Shan, Hongzhang; Singh, Jaswinder P.; Oliker, Leonid; Biswas, Rupak; Biegel, Bryan (Technical Monitor)
2002-01-01
Currently, message passing (MP) and shared address space (SAS) are the two leading parallel programming paradigms. MP has been standardized with MPI, and is the more common and mature approach; however, code development can be extremely difficult, especially for irregularly structured computations. SAS offers substantial ease of programming, but may suffer from performance limitations due to poor spatial locality and high protocol overhead. In this paper, we compare the performance of and the programming effort required for six applications under both programming models on a 32-processor PC-SMP cluster, a platform that is becoming increasingly attractive for high-end scientific computing. Our application suite consists of codes that typically do not exhibit scalable performance under shared-memory programming due to their high communication-to-computation ratios and/or complex communication patterns. Results indicate that SAS can achieve about half the parallel efficiency of MPI for most of our applications, while being competitive for the others. A hybrid MPI+SAS strategy shows only a small performance advantage over pure MPI in some cases. Finally, improved implementations of two MPI collective operations on PC-SMP clusters are presented.
How to Build an AppleSeed: A Parallel Macintosh Cluster for Numerically Intensive Computing
NASA Astrophysics Data System (ADS)
Decyk, V. K.; Dauger, D. E.
We have constructed a parallel cluster consisting of a mixture of Apple Macintosh G3 and G4 computers running the Mac OS, and have achieved very good performance on numerically intensive, parallel plasma particle-incell simulations. A subset of the MPI message-passing library was implemented in Fortran77 and C. This library enabled us to port code, without modification, from other parallel processors to the Macintosh cluster. Unlike Unix-based clusters, no special expertise in operating systems is required to build and run the cluster. This enables us to move parallel computing from the realm of experts to the main stream of computing.
NASA Astrophysics Data System (ADS)
Zatarain Salazar, Jazmin; Reed, Patrick M.; Quinn, Julianne D.; Giuliani, Matteo; Castelletti, Andrea
2017-11-01
Reservoir operations are central to our ability to manage river basin systems serving conflicting multi-sectoral demands under increasingly uncertain futures. These challenges motivate the need for new solution strategies capable of effectively and efficiently discovering the multi-sectoral tradeoffs that are inherent to alternative reservoir operation policies. Evolutionary many-objective direct policy search (EMODPS) is gaining importance in this context due to its capability of addressing multiple objectives and its flexibility in incorporating multiple sources of uncertainties. This simulation-optimization framework has high potential for addressing the complexities of water resources management, and it can benefit from current advances in parallel computing and meta-heuristics. This study contributes a diagnostic assessment of state-of-the-art parallel strategies for the auto-adaptive Borg Multi Objective Evolutionary Algorithm (MOEA) to support EMODPS. Our analysis focuses on the Lower Susquehanna River Basin (LSRB) system where multiple sectoral demands from hydropower production, urban water supply, recreation and environmental flows need to be balanced. Using EMODPS with different parallel configurations of the Borg MOEA, we optimize operating policies over different size ensembles of synthetic streamflows and evaporation rates. As we increase the ensemble size, we increase the statistical fidelity of our objective function evaluations at the cost of higher computational demands. This study demonstrates how to overcome the mathematical and computational barriers associated with capturing uncertainties in stochastic multiobjective reservoir control optimization, where parallel algorithmic search serves to reduce the wall-clock time in discovering high quality representations of key operational tradeoffs. Our results show that emerging self-adaptive parallelization schemes exploiting cooperative search populations are crucial. Such strategies provide a promising new set of tools for effectively balancing exploration, uncertainty, and computational demands when using EMODPS.
NASA Astrophysics Data System (ADS)
Gao, Xiatian; Wang, Xiaogang; Jiang, Binhao
2017-10-01
UPSF (Universal Plasma Simulation Framework) is a new plasma simulation code designed for maximum flexibility by using edge-cutting techniques supported by C++17 standard. Through use of metaprogramming technique, UPSF provides arbitrary dimensional data structures and methods to support various kinds of plasma simulation models, like, Vlasov, particle in cell (PIC), fluid, Fokker-Planck, and their variants and hybrid methods. Through C++ metaprogramming technique, a single code can be used to arbitrary dimensional systems with no loss of performance. UPSF can also automatically parallelize the distributed data structure and accelerate matrix and tensor operations by BLAS. A three-dimensional particle in cell code is developed based on UPSF. Two test cases, Landau damping and Weibel instability for electrostatic and electromagnetic situation respectively, are presented to show the validation and performance of the UPSF code.
Target--distractor separation and feature integration in visual attention to letters.
Driver, J; Baylis, G C
1991-04-01
The interference produced by distractor letters diminishes with increasing distance from a target letter, as if the distractors fall outside an attentional spotlight focussed on the target (Eriksen and Eriksen 1974). We examine Hagenaar and Van der Heijden's (1986) claim that this distance effect is an acuity artefact. Feature integration theory (Treisman 1986) predicts that even when acuity is controlled for, distance effects should be found when interference is produced by conjoined distractor features (e.g. letter-identities), but not when interference arises from isolated distractor features (e.g. letter-strokes). The opposite pattern of results is found. A model is proposed in which both letter-strokes and letter-identities are derived in parallel. The location of letter-strokes can also be coded in parallel, but locating letter-identities may require the operation of attention.
Planning in subsumption architectures
NASA Technical Reports Server (NTRS)
Chalfant, Eugene C.
1994-01-01
A subsumption planner using a parallel distributed computational paradigm based on the subsumption architecture for control of real-world capable robots is described. Virtual sensor state space is used as a planning tool to visualize the robot's anticipated effect on its environment. Decision sequences are generated based on the environmental situation expected at the time the robot must commit to a decision. Between decision points, the robot performs in a preprogrammed manner. A rudimentary, domain-specific partial world model contains enough information to extrapolate the end results of the rote behavior between decision points. A collective network of predictors operates in parallel with the reactive network forming a recurrrent network which generates plans as a hierarchy. Details of a plan segment are generated only when its execution is imminent. The use of the subsumption planner is demonstrated by a simple maze navigation problem.
Parallel computing of a climate model on the dawn 1000 by domain decomposition method
NASA Astrophysics Data System (ADS)
Bi, Xunqiang
1997-12-01
In this paper the parallel computing of a grid-point nine-level atmospheric general circulation model on the Dawn 1000 is introduced. The model was developed by the Institute of Atmospheric Physics (IAP), Chinese Academy of Sciences (CAS). The Dawn 1000 is a MIMD massive parallel computer made by National Research Center for Intelligent Computer (NCIC), CAS. A two-dimensional domain decomposition method is adopted to perform the parallel computing. The potential ways to increase the speed-up ratio and exploit more resources of future massively parallel supercomputation are also discussed.
Ghosh, Arup; Qin, Shiming; Lee, Jooyeoun; Wang, Gi-Nam
2016-01-01
Operational faults and behavioural anomalies associated with PLC control processes take place often in a manufacturing system. Real time identification of these operational faults and behavioural anomalies is necessary in the manufacturing industry. In this paper, we present an automated tool, called PLC Log-Data Analysis Tool (PLAT) that can detect them by using log-data records of the PLC signals. PLAT automatically creates a nominal model of the PLC control process and employs a novel hash table based indexing and searching scheme to satisfy those purposes. Our experiments show that PLAT is significantly fast, provides real time identification of operational faults and behavioural anomalies, and can execute within a small memory footprint. In addition, PLAT can easily handle a large manufacturing system with a reasonable computing configuration and can be installed in parallel to the data logging system to identify operational faults and behavioural anomalies effectively.
Ghosh, Arup; Qin, Shiming; Lee, Jooyeoun
2016-01-01
Operational faults and behavioural anomalies associated with PLC control processes take place often in a manufacturing system. Real time identification of these operational faults and behavioural anomalies is necessary in the manufacturing industry. In this paper, we present an automated tool, called PLC Log-Data Analysis Tool (PLAT) that can detect them by using log-data records of the PLC signals. PLAT automatically creates a nominal model of the PLC control process and employs a novel hash table based indexing and searching scheme to satisfy those purposes. Our experiments show that PLAT is significantly fast, provides real time identification of operational faults and behavioural anomalies, and can execute within a small memory footprint. In addition, PLAT can easily handle a large manufacturing system with a reasonable computing configuration and can be installed in parallel to the data logging system to identify operational faults and behavioural anomalies effectively. PMID:27974882
NASA Astrophysics Data System (ADS)
Decyk, Viktor K.; Dauger, Dean E.
We have constructed a parallel cluster consisting of Apple Macintosh G4 computers running both Classic Mac OS as well as the Unix-based Mac OS X, and have achieved very good performance on numerically intensive, parallel plasma particle-in-cell simulations. Unlike other Unix-based clusters, no special expertise in operating systems is required to build and run the cluster. This enables us to move parallel computing from the realm of experts to the mainstream of computing.
Ropes: Support for collective opertions among distributed threads
NASA Technical Reports Server (NTRS)
Haines, Matthew; Mehrotra, Piyush; Cronk, David
1995-01-01
Lightweight threads are becoming increasingly useful in supporting parallelism and asynchronous control structures in applications and language implementations. Recently, systems have been designed and implemented to support interprocessor communication between lightweight threads so that threads can be exploited in a distributed memory system. Their use, in this setting, has been largely restricted to supporting latency hiding techniques and functional parallelism within a single application. However, to execute data parallel codes independent of other threads in the system, collective operations and relative indexing among threads are required. This paper describes the design of ropes: a scoping mechanism for collective operations and relative indexing among threads. We present the design of ropes in the context of the Chant system, and provide performance results evaluating our initial design decisions.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Berry, K.R.; Hansen, F.R.; Napolitano, L.M.
1992-01-01
DART (DSP Arrary for Reconfigurable Tasks) is a parallel architecture of two high-performance SDP (digital signal processing) chips with the flexibility to handle a wide range of real-time applications. Each of the 32-bit floating-point DSP processes in DART is programmable in a high-level languate ( C'' or Ada). We have added extensions to the real-time operating system used by DART in order to support parallel processor. The combination of high-level language programmability, a real-time operating system, and parallel processing support significantly reduces the development cost of application software for signal processing and control applications. We have demonstrated this capability bymore » using DART to reconstruct images in the prototype VIP (Video Imaging Projectile) groundstation.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Berry, K.R.; Hansen, F.R.; Napolitano, L.M.
1992-01-01
DART (DSP Arrary for Reconfigurable Tasks) is a parallel architecture of two high-performance SDP (digital signal processing) chips with the flexibility to handle a wide range of real-time applications. Each of the 32-bit floating-point DSP processes in DART is programmable in a high-level languate (``C`` or Ada). We have added extensions to the real-time operating system used by DART in order to support parallel processor. The combination of high-level language programmability, a real-time operating system, and parallel processing support significantly reduces the development cost of application software for signal processing and control applications. We have demonstrated this capability by usingmore » DART to reconstruct images in the prototype VIP (Video Imaging Projectile) groundstation.« less
Halvade-RNA: Parallel variant calling from transcriptomic data using MapReduce.
Decap, Dries; Reumers, Joke; Herzeel, Charlotte; Costanza, Pascal; Fostier, Jan
2017-01-01
Given the current cost-effectiveness of next-generation sequencing, the amount of DNA-seq and RNA-seq data generated is ever increasing. One of the primary objectives of NGS experiments is calling genetic variants. While highly accurate, most variant calling pipelines are not optimized to run efficiently on large data sets. However, as variant calling in genomic data has become common practice, several methods have been proposed to reduce runtime for DNA-seq analysis through the use of parallel computing. Determining the effectively expressed variants from transcriptomics (RNA-seq) data has only recently become possible, and as such does not yet benefit from efficiently parallelized workflows. We introduce Halvade-RNA, a parallel, multi-node RNA-seq variant calling pipeline based on the GATK Best Practices recommendations. Halvade-RNA makes use of the MapReduce programming model to create and manage parallel data streams on which multiple instances of existing tools such as STAR and GATK operate concurrently. Whereas the single-threaded processing of a typical RNA-seq sample requires ∼28h, Halvade-RNA reduces this runtime to ∼2h using a small cluster with two 20-core machines. Even on a single, multi-core workstation, Halvade-RNA can significantly reduce runtime compared to using multi-threading, thus providing for a more cost-effective processing of RNA-seq data. Halvade-RNA is written in Java and uses the Hadoop MapReduce 2.0 API. It supports a wide range of distributions of Hadoop, including Cloudera and Amazon EMR.
Image Processing Using a Parallel Architecture.
1987-12-01
ENG/87D-25 Abstract This study developed a set o± low level image processing tools on a parallel computer that allows concurrent processing of images...environment, the set of tools offers a significant reduction in the time required to perform some commonly used image processing operations. vI IMAGE...step toward developing these systems, a structured set of image processing tools was implemented using a parallel computer. More important than
Multi-threading: A new dimension to massively parallel scientific computation
NASA Astrophysics Data System (ADS)
Nielsen, Ida M. B.; Janssen, Curtis L.
2000-06-01
Multi-threading is becoming widely available for Unix-like operating systems, and the application of multi-threading opens new ways for performing parallel computations with greater efficiency. We here briefly discuss the principles of multi-threading and illustrate the application of multi-threading for a massively parallel direct four-index transformation of electron repulsion integrals. Finally, other potential applications of multi-threading in scientific computing are outlined.
Archer, Charles J.; Inglett, Todd A.; Ratterman, Joseph D.; Smith, Brian E.
2010-03-02
Methods, apparatus, and products are disclosed for configuring compute nodes of a parallel computer in an operational group into a plurality of independent non-overlapping collective networks, the compute nodes in the operational group connected together for data communications through a global combining network, that include: partitioning the compute nodes in the operational group into a plurality of non-overlapping subgroups; designating one compute node from each of the non-overlapping subgroups as a master node; and assigning, to the compute nodes in each of the non-overlapping subgroups, class routing instructions that organize the compute nodes in that non-overlapping subgroup as a collective network such that the master node is a physical root.
Distributed intelligence for supervisory control
NASA Technical Reports Server (NTRS)
Wolfe, W. J.; Raney, S. D.
1987-01-01
Supervisory control systems must deal with various types of intelligence distributed throughout the layers of control. Typical layers are real-time servo control, off-line planning and reasoning subsystems and finally, the human operator. Design methodologies must account for the fact that the majority of the intelligence will reside with the human operator. Hierarchical decompositions and feedback loops as conceptual building blocks that provide a common ground for man-machine interaction are discussed. Examples of types of parallelism and parallel implementation on several classes of computer architecture are also discussed.
Cryogenic parallel, single phase flows: an analytical approach
NASA Astrophysics Data System (ADS)
Eichhorn, R.
2017-02-01
Managing the cryogenic flows inside a state-of-the-art accelerator cryomodule has become a demanding endeavour: In order to build highly efficient modules, all heat transfers are usually intercepted at various temperatures. For a multi-cavity module, operated at 1.8 K, this requires intercepts at 4 K and at 80 K at different locations with sometimes strongly varying heat loads which for simplicity reasons are operated in parallel. This contribution will describe an analytical approach, based on optimization theories.
Linux Kernel Co-Scheduling For Bulk Synchronous Parallel Applications
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jones, Terry R
2011-01-01
This paper describes a kernel scheduling algorithm that is based on co-scheduling principles and that is intended for parallel applications running on 1000 cores or more where inter-node scalability is key. Experimental results for a Linux implementation on a Cray XT5 machine are presented.1 The results indicate that Linux is a suitable operating system for this new scheduling scheme, and that this design provides a dramatic improvement in scaling performance for synchronizing collective operations at scale.
DasPy – Open Source Multivariate Land Data Assimilation Framework with High Performance Computing
NASA Astrophysics Data System (ADS)
Han, Xujun; Li, Xin; Montzka, Carsten; Kollet, Stefan; Vereecken, Harry; Hendricks Franssen, Harrie-Jan
2015-04-01
Data assimilation has become a popular method to integrate observations from multiple sources with land surface models to improve predictions of the water and energy cycles of the soil-vegetation-atmosphere continuum. In recent years, several land data assimilation systems have been developed in different research agencies. Because of the software availability or adaptability, these systems are not easy to apply for the purpose of multivariate land data assimilation research. Multivariate data assimilation refers to the simultaneous assimilation of observation data for multiple model state variables into a simulation model. Our main motivation was to develop an open source multivariate land data assimilation framework (DasPy) which is implemented using the Python script language mixed with C++ and Fortran language. This system has been evaluated in several soil moisture, L-band brightness temperature and land surface temperature assimilation studies. The implementation allows also parameter estimation (soil properties and/or leaf area index) on the basis of the joint state and parameter estimation approach. LETKF (Local Ensemble Transform Kalman Filter) is implemented as the main data assimilation algorithm, and uncertainties in the data assimilation can be represented by perturbed atmospheric forcings, perturbed soil and vegetation properties and model initial conditions. The CLM4.5 (Community Land Model) was integrated as the model operator. The CMEM (Community Microwave Emission Modelling Platform), COSMIC (COsmic-ray Soil Moisture Interaction Code) and the two source formulation were integrated as observation operators for assimilation of L-band passive microwave, cosmic-ray soil moisture probe and land surface temperature measurements, respectively. DasPy is parallelized using the hybrid MPI (Message Passing Interface) and OpenMP (Open Multi-Processing) techniques. All the input and output data flow is organized efficiently using the commonly used NetCDF file format. Online 1D and 2D visualization of data assimilation results is also implemented to facilitate the post simulation analysis. In summary, DasPy is a ready to use open source parallel multivariate land data assimilation framework.
NASA Astrophysics Data System (ADS)
Noureldin, K.; González-Escalada, L. M.; Hirsch, T.; Nouri, B.; Pitz-Paal, R.
2016-05-01
A large number of commercial and research line focusing solar power plants are in operation and under development. Such plants include parabolic trough collectors (PTC) or linear Fresnel using thermal oil or molten salt as the heat transfer medium (HTM). However, the continuously varying and dynamic solar condition represent a big challenge for the plant control in order to optimize its power production and to keep the operation safe. A better understanding of the behaviour of such power plants under transient conditions will help reduce defocusing instances, improve field control, and hence, increase the energy yield and confidence in this new technology. Computational methods are very powerful and cost-effective tools to gain such understanding. However, most simulation models described in literature assume equal mass flow distributions among the parallel loops in the field or totally decouple the flow and thermal conditions. In this paper, a new numerical model to simulate a whole solar field with single-phase HTM is described. The proposed model consists of a hydraulic part and a thermal part that are coupled to account for the effect of the thermal condition of the field on the flow distribution among the parallel loops. The model is specifically designed for large line-focusing solar fields offering a high degree of flexibility in terms of layout, condition of the mirrors, and spatially resolved DNI data. Moreover, the model results have been compared to other simulation tools, as well as experimental and plant data, and the results show very good agreement. The model can provide more precise data to the control algorithms to improve the plant control. In addition, short-term and accurate spatially discretized DNI forecasts can be used as input to predict the field behaviour in-advance. In this paper, the hydraulic and thermal parts, as well as the coupling procedure, are described and some validation results and results of simulating an example field are shown.
Quadrocopter Control Design and Flight Operation
NASA Technical Reports Server (NTRS)
Karwoski, Katherine
2011-01-01
A limiting factor in control system design and analysis for spacecraft is the inability to physically test new algorithms quickly and cheaply. Test flights of space vehicles are costly and take much preparation. As such, EV41 recently acquired a small research quadrocopter that has the ability to be a test bed for new control systems. This project focused on learning how to operate, fly, and maintain the quadrocopter, as well as developing and testing protocols for its use. In parallel to this effort, developing a model in Simulink facilitated the design and analysis of simple control systems for the quadrocopter. Software provided by the manufacturer enabled testing of the Simulink control system on the vehicle.
Domain decomposition methods in computational fluid dynamics
NASA Technical Reports Server (NTRS)
Gropp, William D.; Keyes, David E.
1991-01-01
The divide-and-conquer paradigm of iterative domain decomposition, or substructuring, has become a practical tool in computational fluid dynamic applications because of its flexibility in accommodating adaptive refinement through locally uniform (or quasi-uniform) grids, its ability to exploit multiple discretizations of the operator equations, and the modular pathway it provides towards parallelism. These features are illustrated on the classic model problem of flow over a backstep using Newton's method as the nonlinear iteration. Multiple discretizations (second-order in the operator and first-order in the preconditioner) and locally uniform mesh refinement pay dividends separately, and they can be combined synergistically. Sample performance results are included from an Intel iPSC/860 hypercube implementation.
Research on rapid agile metrology for manufacturing based on real-time multitask operating system
NASA Astrophysics Data System (ADS)
Chen, Jihong; Song, Zhen; Yang, Daoshan; Zhou, Ji; Buckley, Shawn
1996-10-01
Rapid agile metrology for manufacturing (RAMM) using multiple non-contact sensors is likely to remain a growing trend in manufacturing. High speed inspecting systems for manufacturing is characterized by multitasks implemented in parallel and real-time events which occur simultaneously. In this paper, we introduce a real-time operating system into RAMM research. A general task model of a class-based object- oriented technology is proposed. A general multitask frame of a typical RAMM system using OPNet is discussed. Finally, an application example of a machine which inspects parts held on a carrier strip is described. With RTOS and OPNet, this machine can measure two dimensions of the contacts at 300 parts/second.
The BLAZE language: A parallel language for scientific programming
NASA Technical Reports Server (NTRS)
Mehrotra, P.; Vanrosendale, J.
1985-01-01
A Pascal-like scientific programming language, Blaze, is described. Blaze contains array arithmetic, forall loops, and APL-style accumulation operators, which allow natural expression of fine grained parallelism. It also employs an applicative or functional procedure invocation mechanism, which makes it easy for compilers to extract coarse grained parallelism using machine specific program restructuring. Thus Blaze should allow one to achieve highly parallel execution on multiprocessor architectures, while still providing the user with onceptually sequential control flow. A central goal in the design of Blaze is portability across a broad range of parallel architectures. The multiple levels of parallelism present in Blaze code, in principle, allow a compiler to extract the types of parallelism appropriate for the given architecture while neglecting the remainder. The features of Blaze are described and shows how this language would be used in typical scientific programming.
Research in Parallel Algorithms and Software for Computational Aerosciences
NASA Technical Reports Server (NTRS)
Domel, Neal D.
1996-01-01
Phase I is complete for the development of a Computational Fluid Dynamics parallel code with automatic grid generation and adaptation for the Euler analysis of flow over complex geometries. SPLITFLOW, an unstructured Cartesian grid code developed at Lockheed Martin Tactical Aircraft Systems, has been modified for a distributed memory/massively parallel computing environment. The parallel code is operational on an SGI network, Cray J90 and C90 vector machines, SGI Power Challenge, and Cray T3D and IBM SP2 massively parallel machines. Parallel Virtual Machine (PVM) is the message passing protocol for portability to various architectures. A domain decomposition technique was developed which enforces dynamic load balancing to improve solution speed and memory requirements. A host/node algorithm distributes the tasks. The solver parallelizes very well, and scales with the number of processors. Partially parallelized and non-parallelized tasks consume most of the wall clock time in a very fine grain environment. Timing comparisons on a Cray C90 demonstrate that Parallel SPLITFLOW runs 2.4 times faster on 8 processors than its non-parallel counterpart autotasked over 8 processors.
Research in Parallel Algorithms and Software for Computational Aerosciences
NASA Technical Reports Server (NTRS)
Domel, Neal D.
1996-01-01
Phase 1 is complete for the development of a computational fluid dynamics CFD) parallel code with automatic grid generation and adaptation for the Euler analysis of flow over complex geometries. SPLITFLOW, an unstructured Cartesian grid code developed at Lockheed Martin Tactical Aircraft Systems, has been modified for a distributed memory/massively parallel computing environment. The parallel code is operational on an SGI network, Cray J90 and C90 vector machines, SGI Power Challenge, and Cray T3D and IBM SP2 massively parallel machines. Parallel Virtual Machine (PVM) is the message passing protocol for portability to various architectures. A domain decomposition technique was developed which enforces dynamic load balancing to improve solution speed and memory requirements. A host/node algorithm distributes the tasks. The solver parallelizes very well, and scales with the number of processors. Partially parallelized and non-parallelized tasks consume most of the wall clock time in a very fine grain environment. Timing comparisons on a Cray C90 demonstrate that Parallel SPLITFLOW runs 2.4 times faster on 8 processors than its non-parallel counterpart autotasked over 8 processors.
NASA Astrophysics Data System (ADS)
Kan, Guangyuan; He, Xiaoyan; Ding, Liuqian; Li, Jiren; Hong, Yang; Zuo, Depeng; Ren, Minglei; Lei, Tianjie; Liang, Ke
2018-01-01
Hydrological model calibration has been a hot issue for decades. The shuffled complex evolution method developed at the University of Arizona (SCE-UA) has been proved to be an effective and robust optimization approach. However, its computational efficiency deteriorates significantly when the amount of hydrometeorological data increases. In recent years, the rise of heterogeneous parallel computing has brought hope for the acceleration of hydrological model calibration. This study proposed a parallel SCE-UA method and applied it to the calibration of a watershed rainfall-runoff model, the Xinanjiang model. The parallel method was implemented on heterogeneous computing systems using OpenMP and CUDA. Performance testing and sensitivity analysis were carried out to verify its correctness and efficiency. Comparison results indicated that heterogeneous parallel computing-accelerated SCE-UA converged much more quickly than the original serial version and possessed satisfactory accuracy and stability for the task of fast hydrological model calibration.
Iterative algorithms for large sparse linear systems on parallel computers
NASA Technical Reports Server (NTRS)
Adams, L. M.
1982-01-01
Algorithms for assembling in parallel the sparse system of linear equations that result from finite difference or finite element discretizations of elliptic partial differential equations, such as those that arise in structural engineering are developed. Parallel linear stationary iterative algorithms and parallel preconditioned conjugate gradient algorithms are developed for solving these systems. In addition, a model for comparing parallel algorithms on array architectures is developed and results of this model for the algorithms are given.
The Data Transfer Kit: A geometric rendezvous-based tool for multiphysics data transfer
DOE Office of Scientific and Technical Information (OSTI.GOV)
Slattery, S. R.; Wilson, P. P. H.; Pawlowski, R. P.
2013-07-01
The Data Transfer Kit (DTK) is a software library designed to provide parallel data transfer services for arbitrary physics components based on the concept of geometric rendezvous. The rendezvous algorithm provides a means to geometrically correlate two geometric domains that may be arbitrarily decomposed in a parallel simulation. By repartitioning both domains such that they have the same geometric domain on each parallel process, efficient and load balanced search operations and data transfer can be performed at a desirable algorithmic time complexity with low communication overhead relative to other types of mapping algorithms. With the increased development efforts in multiphysicsmore » simulation and other multiple mesh and geometry problems, generating parallel topology maps for transferring fields and other data between geometric domains is a common operation. The algorithms used to generate parallel topology maps based on the concept of geometric rendezvous as implemented in DTK are described with an example using a conjugate heat transfer calculation and thermal coupling with a neutronics code. In addition, we provide the results of initial scaling studies performed on the Jaguar Cray XK6 system at Oak Ridge National Laboratory for a worse-case-scenario problem in terms of algorithmic complexity that shows good scaling on 0(1 x 104) cores for topology map generation and excellent scaling on 0(1 x 105) cores for the data transfer operation with meshes of O(1 x 109) elements. (authors)« less
A Domain Decomposition Parallelization of the Fast Marching Method
NASA Technical Reports Server (NTRS)
Herrmann, M.
2003-01-01
In this paper, the first domain decomposition parallelization of the Fast Marching Method for level sets has been presented. Parallel speedup has been demonstrated in both the optimal and non-optimal domain decomposition case. The parallel performance of the proposed method is strongly dependent on load balancing separately the number of nodes on each side of the interface. A load imbalance of nodes on either side of the domain leads to an increase in communication and rollback operations. Furthermore, the amount of inter-domain communication can be reduced by aligning the inter-domain boundaries with the interface normal vectors. In the case of optimal load balancing and aligned inter-domain boundaries, the proposed parallel FMM algorithm is highly efficient, reaching efficiency factors of up to 0.98. Future work will focus on the extension of the proposed parallel algorithm to higher order accuracy. Also, to further enhance parallel performance, the coupling of the domain decomposition parallelization to the G(sub 0)-based parallelization will be investigated.
Synchronizing compute node time bases in a parallel computer
Chen, Dong; Faraj, Daniel A; Gooding, Thomas M; Heidelberger, Philip
2015-01-27
Synchronizing time bases in a parallel computer that includes compute nodes organized for data communications in a tree network, where one compute node is designated as a root, and, for each compute node: calculating data transmission latency from the root to the compute node; configuring a thread as a pulse waiter; initializing a wakeup unit; and performing a local barrier operation; upon each node completing the local barrier operation, entering, by all compute nodes, a global barrier operation; upon all nodes entering the global barrier operation, sending, to all the compute nodes, a pulse signal; and for each compute node upon receiving the pulse signal: waking, by the wakeup unit, the pulse waiter; setting a time base for the compute node equal to the data transmission latency between the root node and the compute node; and exiting the global barrier operation.
Synchronizing compute node time bases in a parallel computer
Chen, Dong; Faraj, Daniel A; Gooding, Thomas M; Heidelberger, Philip
2014-12-30
Synchronizing time bases in a parallel computer that includes compute nodes organized for data communications in a tree network, where one compute node is designated as a root, and, for each compute node: calculating data transmission latency from the root to the compute node; configuring a thread as a pulse waiter; initializing a wakeup unit; and performing a local barrier operation; upon each node completing the local barrier operation, entering, by all compute nodes, a global barrier operation; upon all nodes entering the global barrier operation, sending, to all the compute nodes, a pulse signal; and for each compute node upon receiving the pulse signal: waking, by the wakeup unit, the pulse waiter; setting a time base for the compute node equal to the data transmission latency between the root node and the compute node; and exiting the global barrier operation.
Evaluation of parallel reduction strategies for fusion of sensory information from a robot team
NASA Astrophysics Data System (ADS)
Lyons, Damian M.; Leroy, Joseph
2015-05-01
The advantage of using a team of robots to search or to map an area is that by navigating the robots to different parts of the area, searching or mapping can be completed more quickly. A crucial aspect of the problem is the combination, or fusion, of data from team members to generate an integrated model of the search/mapping area. In prior work we looked at the issue of removing mutual robots views from an integrated point cloud model built from laser and stereo sensors, leading to a cleaner and more accurate model. This paper addresses a further challenge: Even with mutual views removed, the stereo data from a team of robots can quickly swamp a WiFi connection. This paper proposes and evaluates a communication and fusion approach based on the parallel reduction operation, where data is combined in a series of steps of increasing subsets of the team. Eight different strategies for selecting the subsets are evaluated for bandwidth requirements using three robot missions, each carried out with teams of four Pioneer 3-AT robots. Our results indicate that selecting groups to combine based on similar pose but distant location yields the best results.
Eroglu, Duygu Yilmaz; Ozmutlu, H Cenk
2014-01-01
We developed mixed integer programming (MIP) models and hybrid genetic-local search algorithms for the scheduling problem of unrelated parallel machines with job sequence and machine-dependent setup times and with job splitting property. The first contribution of this paper is to introduce novel algorithms which make splitting and scheduling simultaneously with variable number of subjobs. We proposed simple chromosome structure which is constituted by random key numbers in hybrid genetic-local search algorithm (GAspLA). Random key numbers are used frequently in genetic algorithms, but it creates additional difficulty when hybrid factors in local search are implemented. We developed algorithms that satisfy the adaptation of results of local search into the genetic algorithms with minimum relocation operation of genes' random key numbers. This is the second contribution of the paper. The third contribution of this paper is three developed new MIP models which are making splitting and scheduling simultaneously. The fourth contribution of this paper is implementation of the GAspLAMIP. This implementation let us verify the optimality of GAspLA for the studied combinations. The proposed methods are tested on a set of problems taken from the literature and the results validate the effectiveness of the proposed algorithms.
NASA Astrophysics Data System (ADS)
Oh, Ki-Yong; Epureanu, Bogdan I.
2017-10-01
A 1-D phenomenological force model of a Li-ion battery pack is proposed to enhance the control performance of Li-ion battery cells in pack conditions for efficient performance and health management. The force model accounts for multiple swelling sources under the operational environment of electric vehicles to predict swelling-induced forces in pack conditions, i.e. mechanically constrained. The proposed force model not only incorporates structural nonlinearities due to Li-ion intercalation swelling, but also separates the overall range of states of charge into three ranges to account for phase transitions. Moreover, an approach to study cell-to-cell variations in pack conditions is proposed with serial and parallel combinations of linear and nonlinear stiffness, which account for battery cells and other components in the battery pack. The model is shown not only to accurately estimate the reaction force caused by swelling as a function of the state of charge, battery temperature and environmental temperature, but also to account for cell-to-cell variations due to temperature variations, SOC differences, and local degradation in a wide range of operational conditions of electric vehicles. Considering that the force model of Li-ion battery packs can account for many possible situations in actual operation, the proposed approach and model offer potential utility for the enhancement of current battery management systems and power management strategies.
Anchorage strength and slope stability of a landfill liner
DOE Office of Scientific and Technical Information (OSTI.GOV)
Villard, P.; Gourc, J.P.; Feki, N.
1997-11-01
In order to determine reliable dimensions of an anchorage system and satisfactory operation of the watertight liner in a waste landfill, it is essential to make an accurate assessment of the tensions acting on the geosynthetics on the top of the slope. Experimental and theoretical studies have been carried out in parallel. The former concern a full-scale experiment undertaken in Montreuil sur Barse on a waste storage site with instrumented slope. The latter concern anchorage tests performed on a scale model for different anchorage geometries.
Parallel Attack and the Enemy’s Decision Making Process
1998-04-01
Theory Coursebook , Vol 2, Sept 1997, p 365 8 Joint Publication 3-0, Doctrine for Joint Operations, 1 Feb 1995, p III-11 9 Gorrell, LtCol Edgar S., “The...Behavioral Strategies,” Journal of the American Statistical Association, Vol 90, Issue 432, December, 1995, p1137 3 Joint Publication 5-0, Doctrine for...Strategies,” Journal of the American Statistical Association, Vol 90, Issue 432, December, 1995, p1137 Allison, Graham T., “Conceptual Models and the Cuban
Notes on implementation of sparsely distributed memory
NASA Technical Reports Server (NTRS)
Keeler, J. D.; Denning, P. J.
1986-01-01
The Sparsely Distributed Memory (SDM) developed by Kanerva is an unconventional memory design with very interesting and desirable properties. The memory works in a manner that is closely related to modern theories of human memory. The SDM model is discussed in terms of its implementation in hardware. Two appendices discuss the unconventional approaches of the SDM: Appendix A treats a resistive circuit for fast, parallel address decoding; and Appendix B treats a systolic array for high throughput read and write operations.
Wu, Xiao-Lin; Sun, Chuanyu; Beissinger, Timothy M; Rosa, Guilherme Jm; Weigel, Kent A; Gatti, Natalia de Leon; Gianola, Daniel
2012-09-25
Most Bayesian models for the analysis of complex traits are not analytically tractable and inferences are based on computationally intensive techniques. This is true of Bayesian models for genome-enabled selection, which uses whole-genome molecular data to predict the genetic merit of candidate animals for breeding purposes. In this regard, parallel computing can overcome the bottlenecks that can arise from series computing. Hence, a major goal of the present study is to bridge the gap to high-performance Bayesian computation in the context of animal breeding and genetics. Parallel Monte Carlo Markov chain algorithms and strategies are described in the context of animal breeding and genetics. Parallel Monte Carlo algorithms are introduced as a starting point including their applications to computing single-parameter and certain multiple-parameter models. Then, two basic approaches for parallel Markov chain Monte Carlo are described: one aims at parallelization within a single chain; the other is based on running multiple chains, yet some variants are discussed as well. Features and strategies of the parallel Markov chain Monte Carlo are illustrated using real data, including a large beef cattle dataset with 50K SNP genotypes. Parallel Markov chain Monte Carlo algorithms are useful for computing complex Bayesian models, which does not only lead to a dramatic speedup in computing but can also be used to optimize model parameters in complex Bayesian models. Hence, we anticipate that use of parallel Markov chain Monte Carlo will have a profound impact on revolutionizing the computational tools for genomic selection programs.
2012-01-01
Background Most Bayesian models for the analysis of complex traits are not analytically tractable and inferences are based on computationally intensive techniques. This is true of Bayesian models for genome-enabled selection, which uses whole-genome molecular data to predict the genetic merit of candidate animals for breeding purposes. In this regard, parallel computing can overcome the bottlenecks that can arise from series computing. Hence, a major goal of the present study is to bridge the gap to high-performance Bayesian computation in the context of animal breeding and genetics. Results Parallel Monte Carlo Markov chain algorithms and strategies are described in the context of animal breeding and genetics. Parallel Monte Carlo algorithms are introduced as a starting point including their applications to computing single-parameter and certain multiple-parameter models. Then, two basic approaches for parallel Markov chain Monte Carlo are described: one aims at parallelization within a single chain; the other is based on running multiple chains, yet some variants are discussed as well. Features and strategies of the parallel Markov chain Monte Carlo are illustrated using real data, including a large beef cattle dataset with 50K SNP genotypes. Conclusions Parallel Markov chain Monte Carlo algorithms are useful for computing complex Bayesian models, which does not only lead to a dramatic speedup in computing but can also be used to optimize model parameters in complex Bayesian models. Hence, we anticipate that use of parallel Markov chain Monte Carlo will have a profound impact on revolutionizing the computational tools for genomic selection programs. PMID:23009363
Operator Finds Control at His Fingertips.
ERIC Educational Resources Information Center
Goscicki, Edward
1979-01-01
Discussed are the advantages associated with the use of computer systems in wastewater treatment facilities. The system parallels plant organization and considers operations, maintenance, and plant management. (CS)
NASA Technical Reports Server (NTRS)
Barnes, George H. (Inventor); Lundstrom, Stephen F. (Inventor); Shafer, Philip E. (Inventor)
1983-01-01
A high speed parallel array data processing architecture fashioned under a computational envelope approach includes a data base memory for secondary storage of programs and data, and a plurality of memory modules interconnected to a plurality of processing modules by a connection network of the Omega gender. Programs and data are fed from the data base memory to the plurality of memory modules and from hence the programs are fed through the connection network to the array of processors (one copy of each program for each processor). Execution of the programs occur with the processors operating normally quite independently of each other in a multiprocessing fashion. For data dependent operations and other suitable operations, all processors are instructed to finish one given task or program branch before all are instructed to proceed in parallel processing fashion on the next instruction. Even when functioning in the parallel processing mode however, the processors are not locked-step but execute their own copy of the program individually unless or until another overall processor array synchronization instruction is issued.
An efficient parallel-processing method for transposing large matrices in place.
Portnoff, M R
1999-01-01
We have developed an efficient algorithm for transposing large matrices in place. The algorithm is efficient because data are accessed either sequentially in blocks or randomly within blocks small enough to fit in cache, and because the same indexing calculations are shared among identical procedures operating on independent subsets of the data. This inherent parallelism makes the method well suited for a multiprocessor computing environment. The algorithm is easy to implement because the same two procedures are applied to the data in various groupings to carry out the complete transpose operation. Using only a single processor, we have demonstrated nearly an order of magnitude increase in speed over the previously published algorithm by Gate and Twigg for transposing a large rectangular matrix in place. With multiple processors operating in parallel, the processing speed increases almost linearly with the number of processors. A simplified version of the algorithm for square matrices is presented as well as an extension for matrices large enough to require virtual memory.
A model for optimizing file access patterns using spatio-temporal parallelism
DOE Office of Scientific and Technical Information (OSTI.GOV)
Boonthanome, Nouanesengsy; Patchett, John; Geveci, Berk
2013-01-01
For many years now, I/O read time has been recognized as the primary bottleneck for parallel visualization and analysis of large-scale data. In this paper, we introduce a model that can estimate the read time for a file stored in a parallel filesystem when given the file access pattern. Read times ultimately depend on how the file is stored and the access pattern used to read the file. The file access pattern will be dictated by the type of parallel decomposition used. We employ spatio-temporal parallelism, which combines both spatial and temporal parallelism, to provide greater flexibility to possible filemore » access patterns. Using our model, we were able to configure the spatio-temporal parallelism to design optimized read access patterns that resulted in a speedup factor of approximately 400 over traditional file access patterns.« less
Optimisation of a parallel ocean general circulation model
NASA Astrophysics Data System (ADS)
Beare, M. I.; Stevens, D. P.
1997-10-01
This paper presents the development of a general-purpose parallel ocean circulation model, for use on a wide range of computer platforms, from traditional scalar machines to workstation clusters and massively parallel processors. Parallelism is provided, as a modular option, via high-level message-passing routines, thus hiding the technical intricacies from the user. An initial implementation highlights that the parallel efficiency of the model is adversely affected by a number of factors, for which optimisations are discussed and implemented. The resulting ocean code is portable and, in particular, allows science to be achieved on local workstations that could otherwise only be undertaken on state-of-the-art supercomputers.
Composing Data Parallel Code for a SPARQL Graph Engine
DOE Office of Scientific and Technical Information (OSTI.GOV)
Castellana, Vito G.; Tumeo, Antonino; Villa, Oreste
Big data analytics process large amount of data to extract knowledge from them. Semantic databases are big data applications that adopt the Resource Description Framework (RDF) to structure metadata through a graph-based representation. The graph based representation provides several benefits, such as the possibility to perform in memory processing with large amounts of parallelism. SPARQL is a language used to perform queries on RDF-structured data through graph matching. In this paper we present a tool that automatically translates SPARQL queries to parallel graph crawling and graph matching operations. The tool also supports complex SPARQL constructs, which requires more than basicmore » graph matching for their implementation. The tool generates parallel code annotated with OpenMP pragmas for x86 Shared-memory Multiprocessors (SMPs). With respect to commercial database systems such as Virtuoso, our approach reduces memory occupation due to join operations and provides higher performance. We show the scaling of the automatically generated graph-matching code on a 48-core SMP.« less
Solving Partial Differential Equations in a data-driven multiprocessor environment
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gaudiot, J.L.; Lin, C.M.; Hosseiniyar, M.
1988-12-31
Partial differential equations can be found in a host of engineering and scientific problems. The emergence of new parallel architectures has spurred research in the definition of parallel PDE solvers. Concurrently, highly programmable systems such as data-how architectures have been proposed for the exploitation of large scale parallelism. The implementation of some Partial Differential Equation solvers (such as the Jacobi method) on a tagged token data-flow graph is demonstrated here. Asynchronous methods (chaotic relaxation) are studied and new scheduling approaches (the Token No-Labeling scheme) are introduced in order to support the implementation of the asychronous methods in a data-driven environment.more » New high-level data-flow language program constructs are introduced in order to handle chaotic operations. Finally, the performance of the program graphs is demonstrated by a deterministic simulation of a message passing data-flow multiprocessor. An analysis of the overhead in the data-flow graphs is undertaken to demonstrate the limits of parallel operations in dataflow PDE program graphs.« less
Exploring the Feasibility of a DNA Computer: Design of an ALU Using Sticker-Based DNA Model.
Sarkar, Mayukh; Ghosal, Prasun; Mohanty, Saraju P
2017-09-01
Since its inception, DNA computing has advanced to offer an extremely powerful, energy-efficient emerging technology for solving hard computational problems with its inherent massive parallelism and extremely high data density. This would be much more powerful and general purpose when combined with other existing well-known algorithmic solutions that exist for conventional computing architectures using a suitable ALU. Thus, a specifically designed DNA Arithmetic and Logic Unit (ALU) that can address operations suitable for both domains can mitigate the gap between these two. An ALU must be able to perform all possible logic operations, including NOT, OR, AND, XOR, NOR, NAND, and XNOR; compare, shift etc., integer and floating point arithmetic operations (addition, subtraction, multiplication, and division). In this paper, design of an ALU has been proposed using sticker-based DNA model with experimental feasibility analysis. Novelties of this paper may be in manifold. First, the integer arithmetic operations performed here are 2s complement arithmetic, and the floating point operations follow the IEEE 754 floating point format, resembling closely to a conventional ALU. Also, the output of each operation can be reused for any next operation. So any algorithm or program logic that users can think of can be implemented directly on the DNA computer without any modification. Second, once the basic operations of sticker model can be automated, the implementations proposed in this paper become highly suitable to design a fully automated ALU. Third, proposed approaches are easy to implement. Finally, these approaches can work on sufficiently large binary numbers.
NASA Technical Reports Server (NTRS)
Lou, John; Ferraro, Robert; Farrara, John; Mechoso, Carlos
1996-01-01
An analysis is presented of several factors influencing the performance of a parallel implementation of the UCLA atmospheric general circulation model (AGCM) on massively parallel computer systems. Several modificaitons to the original parallel AGCM code aimed at improving its numerical efficiency, interprocessor communication cost, load-balance and issues affecting single-node code performance are discussed.
NASA Technical Reports Server (NTRS)
Zacharias, G. L.; Young, L. R.
1981-01-01
Measurements are made of manual control performance in the closed-loop task of nulling perceived self-rotation velocity about an earth-vertical axis. Self-velocity estimation is modeled as a function of the simultaneous presentation of vestibular and peripheral visual field motion cues. Based on measured low-frequency operator behavior in three visual field environments, a parallel channel linear model is proposed which has separate visual and vestibular pathways summing in a complementary manner. A dual-input describing function analysis supports the complementary model; vestibular cues dominate sensation at higher frequencies. The describing function model is extended by the proposal of a nonlinear cue conflict model, in which cue weighting depends on the level of agreement between visual and vestibular cues.
A novel highly parallel algorithm for linearly unmixing hyperspectral images
NASA Astrophysics Data System (ADS)
Guerra, Raúl; López, Sebastián.; Callico, Gustavo M.; López, Jose F.; Sarmiento, Roberto
2014-10-01
Endmember extraction and abundances calculation represent critical steps within the process of linearly unmixing a given hyperspectral image because of two main reasons. The first one is due to the need of computing a set of accurate endmembers in order to further obtain confident abundance maps. The second one refers to the huge amount of operations involved in these time-consuming processes. This work proposes an algorithm to estimate the endmembers of a hyperspectral image under analysis and its abundances at the same time. The main advantage of this algorithm is its high parallelization degree and the mathematical simplicity of the operations implemented. This algorithm estimates the endmembers as virtual pixels. In particular, the proposed algorithm performs the descent gradient method to iteratively refine the endmembers and the abundances, reducing the mean square error, according with the linear unmixing model. Some mathematical restrictions must be added so the method converges in a unique and realistic solution. According with the algorithm nature, these restrictions can be easily implemented. The results obtained with synthetic images demonstrate the well behavior of the algorithm proposed. Moreover, the results obtained with the well-known Cuprite dataset also corroborate the benefits of our proposal.
Generating performance portable geoscientific simulation code with Firedrake (Invited)
NASA Astrophysics Data System (ADS)
Ham, D. A.; Bercea, G.; Cotter, C. J.; Kelly, P. H.; Loriant, N.; Luporini, F.; McRae, A. T.; Mitchell, L.; Rathgeber, F.
2013-12-01
This presentation will demonstrate how a change in simulation programming paradigm can be exploited to deliver sophisticated simulation capability which is far easier to programme than are conventional models, is capable of exploiting different emerging parallel hardware, and is tailored to the specific needs of geoscientific simulation. Geoscientific simulation represents a grand challenge computational task: many of the largest computers in the world are tasked with this field, and the requirements of resolution and complexity of scientists in this field are far from being sated. However, single thread performance has stalled, even sometimes decreased, over the last decade, and has been replaced by ever more parallel systems: both as conventional multicore CPUs and in the emerging world of accelerators. At the same time, the needs of scientists to couple ever-more complex dynamics and parametrisations into their models makes the model development task vastly more complex. The conventional approach of writing code in low level languages such as Fortran or C/C++ and then hand-coding parallelism for different platforms by adding library calls and directives forces the intermingling of the numerical code with its implementation. This results in an almost impossible set of skill requirements for developers, who must simultaneously be domain science experts, numericists, software engineers and parallelisation specialists. Even more critically, it requires code to be essentially rewritten for each emerging hardware platform. Since new platforms are emerging constantly, and since code owners do not usually control the procurement of the supercomputers on which they must run, this represents an unsustainable development load. The Firedrake system, conversely, offers the developer the opportunity to write PDE discretisations in the high-level mathematical language UFL from the FEniCS project (http://fenicsproject.org). Non-PDE model components, such as parametrisations, can be written as short C kernels operating locally on the underlying mesh, with no explicit parallelism. The executable code is then generated in C, CUDA or OpenCL and executed in parallel on the target architecture. The system also offers features of special relevance to the geosciences. In particular, the large scale separation between the vertical and horizontal directions in many geoscientific processes can be exploited to offer the flexibility of unstructured meshes in the horizontal direction, without the performance penalty usually associated with those methods.
NASA Technical Reports Server (NTRS)
Farhat, Charbel
1998-01-01
In this grant, we have proposed a three-year research effort focused on developing High Performance Computation and Communication (HPCC) methodologies for structural analysis on parallel processors and clusters of workstations, with emphasis on reducing the structural design cycle time. Besides consolidating and further improving the FETI solver technology to address plate and shell structures, we have proposed to tackle the following design related issues: (a) parallel coupling and assembly of independently designed and analyzed three-dimensional substructures with non-matching interfaces, (b) fast and smart parallel re-analysis of a given structure after it has undergone design modifications, (c) parallel evaluation of sensitivity operators (derivatives) for design optimization, and (d) fast parallel analysis of mildly nonlinear structures. While our proposal was accepted, support was provided only for one year.
Commercial absorption chiller models for evaluation of control strategies
DOE Office of Scientific and Technical Information (OSTI.GOV)
Koeppel, E.A.; Klein, S.A.; Mitchell, J.W.
1995-08-01
A steady-state computer simulation model of a direct fired double-effect water-lithium bromide absorption chiller in the parallel-flow configuration was developed from first principles. Unknown model parameters such as heat transfer coefficients were determined by matching the model`s calculated state points and coefficient of performance (COP) against nominal full-load operating data and COPs obtained from a manufacturer`s catalog. The model compares favorably with the manufacturer`s performance ratings for varying water circuit (chilled and cooling) temperatures at full load conditions and for chiller part-load performance. The model was used (1) to investigate the effect of varying the water circuit flow rates withmore » the chiller load and (2) to optimize chiller part-load performance with respect to the distribution and flow of the weak solution.« less
Parallel computation with molecular-motor-propelled agents in nanofabricated networks.
Nicolau, Dan V; Lard, Mercy; Korten, Till; van Delft, Falco C M J M; Persson, Malin; Bengtsson, Elina; Månsson, Alf; Diez, Stefan; Linke, Heiner; Nicolau, Dan V
2016-03-08
The combinatorial nature of many important mathematical problems, including nondeterministic-polynomial-time (NP)-complete problems, places a severe limitation on the problem size that can be solved with conventional, sequentially operating electronic computers. There have been significant efforts in conceiving parallel-computation approaches in the past, for example: DNA computation, quantum computation, and microfluidics-based computation. However, these approaches have not proven, so far, to be scalable and practical from a fabrication and operational perspective. Here, we report the foundations of an alternative parallel-computation system in which a given combinatorial problem is encoded into a graphical, modular network that is embedded in a nanofabricated planar device. Exploring the network in a parallel fashion using a large number of independent, molecular-motor-propelled agents then solves the mathematical problem. This approach uses orders of magnitude less energy than conventional computers, thus addressing issues related to power consumption and heat dissipation. We provide a proof-of-concept demonstration of such a device by solving, in a parallel fashion, the small instance {2, 5, 9} of the subset sum problem, which is a benchmark NP-complete problem. Finally, we discuss the technical advances necessary to make our system scalable with presently available technology.
NASA Astrophysics Data System (ADS)
Georgiev, K.; Zlatev, Z.
2010-11-01
The Danish Eulerian Model (DEM) is an Eulerian model for studying the transport of air pollutants on large scale. Originally, the model was developed at the National Environmental Research Institute of Denmark. The model computational domain covers Europe and some neighbour parts belong to the Atlantic Ocean, Asia and Africa. If DEM model is to be applied by using fine grids, then its discretization leads to a huge computational problem. This implies that such a model as DEM must be run only on high-performance computer architectures. The implementation and tuning of such a complex large-scale model on each different computer is a non-trivial task. Here, some comparison results of running of this model on different kind of vector (CRAY C92A, Fujitsu, etc.), parallel computers with distributed memory (IBM SP, CRAY T3E, Beowulf clusters, Macintosh G4 clusters, etc.), parallel computers with shared memory (SGI Origin, SUN, etc.) and parallel computers with two levels of parallelism (IBM SMP, IBM BlueGene/P, clusters of multiprocessor nodes, etc.) will be presented. The main idea in the parallel version of DEM is domain partitioning approach. Discussions according to the effective use of the cache and hierarchical memories of the modern computers as well as the performance, speed-ups and efficiency achieved will be done. The parallel code of DEM, created by using MPI standard library, appears to be highly portable and shows good efficiency and scalability on different kind of vector and parallel computers. Some important applications of the computer model output are presented in short.
GPU accelerated cell-based adaptive mesh refinement on unstructured quadrilateral grid
NASA Astrophysics Data System (ADS)
Luo, Xisheng; Wang, Luying; Ran, Wei; Qin, Fenghua
2016-10-01
A GPU accelerated inviscid flow solver is developed on an unstructured quadrilateral grid in the present work. For the first time, the cell-based adaptive mesh refinement (AMR) is fully implemented on GPU for the unstructured quadrilateral grid, which greatly reduces the frequency of data exchange between GPU and CPU. Specifically, the AMR is processed with atomic operations to parallelize list operations, and null memory recycling is realized to improve the efficiency of memory utilization. It is found that results obtained by GPUs agree very well with the exact or experimental results in literature. An acceleration ratio of 4 is obtained between the parallel code running on the old GPU GT9800 and the serial code running on E3-1230 V2. With the optimization of configuring a larger L1 cache and adopting Shared Memory based atomic operations on the newer GPU C2050, an acceleration ratio of 20 is achieved. The parallelized cell-based AMR processes have achieved 2x speedup on GT9800 and 18x on Tesla C2050, which demonstrates that parallel running of the cell-based AMR method on GPU is feasible and efficient. Our results also indicate that the new development of GPU architecture benefits the fluid dynamics computing significantly.
Ho, ThienLuan; Oh, Seung-Rohk
2017-01-01
Approximate string matching with k-differences has a number of practical applications, ranging from pattern recognition to computational biology. This paper proposes an efficient memory-access algorithm for parallel approximate string matching with k-differences on Graphics Processing Units (GPUs). In the proposed algorithm, all threads in the same GPUs warp share data using warp-shuffle operation instead of accessing the shared memory. Moreover, we implement the proposed algorithm by exploiting the memory structure of GPUs to optimize its performance. Experiment results for real DNA packages revealed that the performance of the proposed algorithm and its implementation archived up to 122.64 and 1.53 times compared to that of sequential algorithm on CPU and previous parallel approximate string matching algorithm on GPUs, respectively. PMID:29016700
Multitasking the three-dimensional transport code TORT on CRAY platforms
DOE Office of Scientific and Technical Information (OSTI.GOV)
Azmy, Y.Y.; Barnett, D.A.; Burre, C.A.
1996-04-01
The multitasking options in the three-dimensional neutral particle transport code TORT originally implemented for Cray`s CTSS operating system are revived and extended to run on Cray Y/MP and C90 computers using the UNICOS operating system. These include two coarse-grained domain decompositions; across octants, and across directions within an octant, termed Octant Parallel (OP), and Direction Parallel (DP), respectively. Parallel performance of the DP is significantly enhanced by increasing the task grain size and reducing load imbalance via dynamic scheduling of the discrete angles among the participating tasks. Substantial Wall Clock speedup factors, approaching 4.5 using 8 tasks, have been measuredmore » in a time-sharing environment, and generally depend on the test problem specifications, number of tasks, and machine loading during execution.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sullivan, M.; Anderson, D.P.
1988-01-01
Marionette is a system for distributed parallel programming in an environment of networked heterogeneous computer systems. It is based on a master/slave model. The master process can invoke worker operations (asynchronous remote procedure calls to single slaves) and context operations (updates to the state of all slaves). The master and slaves also interact through shared data structures that can be modified only by the master. The master and slave processes are programmed in a sequential language. The Marionette runtime system manages slave process creation, propagates shared data structures to slaves as needed, queues and dispatches worker and context operations, andmore » manages recovery from slave processor failures. The Marionette system also includes tools for automated compilation of program binaries for multiple architectures, and for distributing binaries to remote fuel systems. A UNIX-based implementation of Marionette is described.« less
An OpenACC-Based Unified Programming Model for Multi-accelerator Systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kim, Jungwon; Lee, Seyong; Vetter, Jeffrey S
2015-01-01
This paper proposes a novel SPMD programming model of OpenACC. Our model integrates the different granularities of parallelism from vector-level parallelism to node-level parallelism into a single, unified model based on OpenACC. It allows programmers to write programs for multiple accelerators using a uniform programming model whether they are in shared or distributed memory systems. We implement a prototype of our model and evaluate its performance with a GPU-based supercomputer using three benchmark applications.
Boundedness and exponential convergence in a chemotaxis model for tumor invasion
NASA Astrophysics Data System (ADS)
Jin, Hai-Yang; Xiang, Tian
2016-12-01
We revisit the following chemotaxis system modeling tumor invasion {ut=Δu-∇ṡ(u∇v),x∈Ω,t>0,vt=Δv+wz,x∈Ω,t>0,wt=-wz,x∈Ω,t>0,zt=Δz-z+u,x∈Ω,t>0, in a smooth bounded domain Ω \\subset {{{R}}n}(n≥slant 1) with homogeneous Neumann boundary and initial conditions. This model was recently proposed by Fujie et al (2014 Adv. Math. Sci. Appl. 24 67-84) as a model for tumor invasion with the role of extracellular matrix incorporated, and was analyzed later by Fujie et al (2016 Discrete Contin. Dyn. Syst. 36 151-69), showing the uniform boundedness and convergence for n≤slant 3 . In this work, we first show that the {{L}∞} -boundedness of the system can be reduced to the boundedness of \\parallel u(\\centerdot,t){{\\parallel}{{L\\frac{n{4}+ɛ}}(Ω )}} for some ɛ >0 alone, and then, for n≥slant 4 , if the initial data \\parallel {{u}0}{{\\parallel}{{L\\frac{n{4}}}}} , \\parallel {{z}0}{{\\parallel}{{L\\frac{n{2}}}}} and \\parallel \
Blocksome, Michael A [Rochester, MN
2011-12-20
Methods, apparatus, and products are disclosed for determining when a set of compute nodes participating in a barrier operation on a parallel computer are ready to exit the barrier operation that includes, for each compute node in the set: initializing a barrier counter with no counter underflow interrupt; configuring, upon entering the barrier operation, the barrier counter with a value in dependence upon a number of compute nodes in the set; broadcasting, by a DMA engine on the compute node to each of the other compute nodes upon entering the barrier operation, a barrier control packet; receiving, by the DMA engine from each of the other compute nodes, a barrier control packet; modifying, by the DMA engine, the value for the barrier counter in dependence upon each of the received barrier control packets; exiting the barrier operation if the value for the barrier counter matches the exit value.
Evolving binary classifiers through parallel computation of multiple fitness cases.
Cagnoni, Stefano; Bergenti, Federico; Mordonini, Monica; Adorni, Giovanni
2005-06-01
This paper describes two versions of a novel approach to developing binary classifiers, based on two evolutionary computation paradigms: cellular programming and genetic programming. Such an approach achieves high computation efficiency both during evolution and at runtime. Evolution speed is optimized by allowing multiple solutions to be computed in parallel. Runtime performance is optimized explicitly using parallel computation in the case of cellular programming or implicitly taking advantage of the intrinsic parallelism of bitwise operators on standard sequential architectures in the case of genetic programming. The approach was tested on a digit recognition problem and compared with a reference classifier.
Adding the 'heart' to hanging drop networks for microphysiological multi-tissue experiments.
Rismani Yazdi, Saeed; Shadmani, Amir; Bürgel, Sebastian C; Misun, Patrick M; Hierlemann, Andreas; Frey, Olivier
2015-11-07
Microfluidic hanging-drop networks enable culturing and analysis of 3D microtissue spheroids derived from different cell types under controlled perfusion and investigating inter-tissue communication in multi-tissue formats. In this paper we introduce a compact on-chip pumping approach for flow control in hanging-drop networks. The pump includes one pneumatic chamber located directly above one of the hanging drops and uses the surface tension at the liquid-air-interface for flow actuation. Control of the pneumatic protocol provides a wide range of unidirectional pulsatile and continuous flow profiles. With the proposed concept several independent hanging-drop networks can be operated in parallel with only one single pneumatic actuation line at high fidelity. Closed-loop medium circulation between different organ models for multi-tissue formats and multiple simultaneous assays in parallel are possible. Finally, we implemented a real-time feedback control-loop of the pump actuation based on the beating of a human iPS-derived cardiac microtissue cultured in the same system. This configuration allows for simulating physiological effects on the heart and their impact on flow circulation between the organ models on chip.
NASA Astrophysics Data System (ADS)
Chacon, Luis; Del-Castillo-Negrete, Diego; Hauck, Cory
2012-10-01
Modeling electron transport in magnetized plasmas is extremely challenging due to the extreme anisotropy between parallel (to the magnetic field) and perpendicular directions (χ/χ˜10^10 in fusion plasmas). Recently, a Lagrangian Green's function approach, developed for the purely parallel transport case,footnotetextD. del-Castillo-Negrete, L. Chac'on, PRL, 106, 195004 (2011)^,footnotetextD. del-Castillo-Negrete, L. Chac'on, Phys. Plasmas, 19, 056112 (2012) has been extended to the anisotropic transport case in the tokamak-ordering limit with constant density.footnotetextL. Chac'on, D. del-Castillo-Negrete, C. Hauck, JCP, submitted (2012) An operator-split algorithm is proposed that allows one to treat Eulerian and Lagrangian components separately. The approach is shown to feature bounded numerical errors for arbitrary χ/χ ratios, which renders it asymptotic-preserving. In this poster, we will present the generalization of the Lagrangian approach to arbitrary magnetic fields. We will demonstrate the potential of the approach with various challenging configurations, including the case of transport across a magnetic island in cylindrical geometry.
Learning and Parallelization Boost Constraint Search
ERIC Educational Resources Information Center
Yun, Xi
2013-01-01
Constraint satisfaction problems are a powerful way to abstract and represent academic and real-world problems from both artificial intelligence and operations research. A constraint satisfaction problem is typically addressed by a sequential constraint solver running on a single processor. Rather than construct a new, parallel solver, this work…
Models of Wake-Vortex Spreading Mechanisms and Their Estimated Uncertainties
NASA Technical Reports Server (NTRS)
Rossow, Vernon J.; Hardy, Gordon H.; Meyn, Larry A.
2006-01-01
One of the primary constraints on the capacity of the nation's air transportation system is the landing capacity at its busiest airports. Many airports with nearly-simultaneous operations on closely-spaced parallel runways (i.e., as close as 750 ft (246m)) suffer a severe decrease in runway acceptance rate when weather conditions do not allow full utilization. The objective of a research program at NASA Ames Research Center is to develop the technologies needed for traffic management in the airport environment so that operations now allowed on closely-spaced parallel runways under Visual Meteorological Conditions can also be carried out under Instrument Meteorological Conditions. As part of this overall research objective, the study reported here has developed improved models for the various aerodynamic mechanisms that spread and transport wake vortices. The purpose of the study is to continue the development of relationships that increase the accuracy of estimates for the along-trail separation distances available before the vortex wake of a leading aircraft intrudes into the airspace of a following aircraft. Details of the models used and their uncertainties are presented in the appendices to the paper. Suggestions are made as to the theoretical and experimental research needed to increase the accuracy of and confidence level in the models presented and instrumentation required or more precise estimates of the motion and spread of vortex wakes. The improved wake models indicate that, if the following aircraft is upwind of the leading aircraft, the vortex wakes of the leading aircraft will not intrude into the airspace of the following aircraft for about 7s (based on pessimistic assumptions) for most atmospheric conditions. The wake-spreading models also indicate that longer time intervals before wake intrusion are available when atmospheric turbulence levels are mild or moderate. However, if the estimates for those time intervals are to be reliable, further study is necessary to develop the instrumentation and procedures needed to accurately define when the more benign atmospheric conditions exist.
The BLAZE language - A parallel language for scientific programming
NASA Technical Reports Server (NTRS)
Mehrotra, Piyush; Van Rosendale, John
1987-01-01
A Pascal-like scientific programming language, BLAZE, is described. BLAZE contains array arithmetic, forall loops, and APL-style accumulation operators, which allow natural expression of fine grained parallelism. It also employs an applicative or functional procedure invocation mechanism, which makes it easy for compilers to extract coarse grained parallelism using machine specific program restructuring. Thus BLAZE should allow one to achieve highly parallel execution on multiprocessor architectures, while still providing the user with conceptually sequential control flow. A central goal in the design of BLAZE is portability across a broad range of parallel architectures. The multiple levels of parallelism present in BLAZE code, in principle, allow a compiler to extract the types of parallelism appropriate for the given architecture while neglecting the remainder. The features of BLAZE are described and it is shown how this language would be used in typical scientific programming.
Parallel Computation of the Jacobian Matrix for Nonlinear Equation Solvers Using MATLAB
NASA Technical Reports Server (NTRS)
Rose, Geoffrey K.; Nguyen, Duc T.; Newman, Brett A.
2017-01-01
Demonstrating speedup for parallel code on a multicore shared memory PC can be challenging in MATLAB due to underlying parallel operations that are often opaque to the user. This can limit potential for improvement of serial code even for the so-called embarrassingly parallel applications. One such application is the computation of the Jacobian matrix inherent to most nonlinear equation solvers. Computation of this matrix represents the primary bottleneck in nonlinear solver speed such that commercial finite element (FE) and multi-body-dynamic (MBD) codes attempt to minimize computations. A timing study using MATLAB's Parallel Computing Toolbox was performed for numerical computation of the Jacobian. Several approaches for implementing parallel code were investigated while only the single program multiple data (spmd) method using composite objects provided positive results. Parallel code speedup is demonstrated but the goal of linear speedup through the addition of processors was not achieved due to PC architecture.
A Verification System for Distributed Objects with Asynchronous Method Calls
NASA Astrophysics Data System (ADS)
Ahrendt, Wolfgang; Dylla, Maximilian
We present a verification system for Creol, an object-oriented modeling language for concurrent distributed applications. The system is an instance of KeY, a framework for object-oriented software verification, which has so far been applied foremost to sequential Java. Building on KeY characteristic concepts, like dynamic logic, sequent calculus, explicit substitutions, and the taclet rule language, the system presented in this paper addresses functional correctness of Creol models featuring local cooperative thread parallelism and global communication via asynchronous method calls. The calculus heavily operates on communication histories which describe the interfaces of Creol units. Two example scenarios demonstrate the usage of the system.
Modeling of composite beams and plates for static and dynamic analysis
NASA Technical Reports Server (NTRS)
Hodges, Dewey H.; Atilgan, Ali R.; Lee, Bok Woo
1990-01-01
A rigorous theory and corresponding computational algorithms was developed for a variety of problems regarding the analysis of composite beams and plates. The modeling approach is intended to be applicable to both static and dynamic analysis of generally anisotropic, nonhomogeneous beams and plates. Development of a theory for analysis of the local deformation of plates was the major focus. Some work was performed on global deformation of beams. Because of the strong parallel between beams and plates, the two were treated together as thin bodies, especially in cases where it will clarify the meaning of certain terminology and the motivation behind certain mathematical operations.
Time-partitioning simulation models for calculation on parallel computers
NASA Technical Reports Server (NTRS)
Milner, Edward J.; Blech, Richard A.; Chima, Rodrick V.
1987-01-01
A technique allowing time-staggered solution of partial differential equations is presented in this report. Using this technique, called time-partitioning, simulation execution speedup is proportional to the number of processors used because all processors operate simultaneously, with each updating of the solution grid at a different time point. The technique is limited by neither the number of processors available nor by the dimension of the solution grid. Time-partitioning was used to obtain the flow pattern through a cascade of airfoils, modeled by the Euler partial differential equations. An execution speedup factor of 1.77 was achieved using a two processor Cray X-MP/24 computer.
NASA Astrophysics Data System (ADS)
He, Jun; Gao, Feng; Bai, Yongjun; Wu, Shengfu
2013-11-01
The large capacity servo press is traditionally realized by means of redundant actuation, however there exist the over-constraint problem and interference among actuators, which increases the control difficulty and the product cost. A new type of press mechanism with parallel topology is presented to develop the mechanical servo press with high stamping capacity. The dynamic model considering gravity counterbalance is proposed based on the virtual work principle, and then the effect of counterbalance cylinder on the dynamic performance of the servo press is studied. It is found that the motor torque required to operate the press is a lot less than the others when the ratio of the counterbalance force to the gravity of ram is in the vicinity of 1.0. The stamping force of the real press prototype can reach up to 25 MN on the position of 13 mm away from the bottom dead center. The typical deep-drawing process with 1 200 mm stroke at 8 strokes per minute is proposed by means of five order polynomial. On this process condition, the driving torques are calculated based on the above dynamic model and the torque measuring test is also carried out on the prototype. It is shown that the curve trend of calculation torque is consistent to the measured result and that the average error is less than 15%. The parallel mechanism is introduced into the development of large capacity servo press to avoid the over-constraint and interference of traditional redundant actuation, and its dynamic characteristics with gravity counterbalance are presented.
An embedded multi-core parallel model for real-time stereo imaging
NASA Astrophysics Data System (ADS)
He, Wenjing; Hu, Jian; Niu, Jingyu; Li, Chuanrong; Liu, Guangyu
2018-04-01
The real-time processing based on embedded system will enhance the application capability of stereo imaging for LiDAR and hyperspectral sensor. The task partitioning and scheduling strategies for embedded multiprocessor system starts relatively late, compared with that for PC computer. In this paper, aimed at embedded multi-core processing platform, a parallel model for stereo imaging is studied and verified. After analyzing the computing amount, throughout capacity and buffering requirements, a two-stage pipeline parallel model based on message transmission is established. This model can be applied to fast stereo imaging for airborne sensors with various characteristics. To demonstrate the feasibility and effectiveness of the parallel model, a parallel software was designed using test flight data, based on the 8-core DSP processor TMS320C6678. The results indicate that the design performed well in workload distribution and had a speed-up ratio up to 6.4.
Global interrupt and barrier networks
Blumrich, Matthias A.; Chen, Dong; Coteus, Paul W.; Gara, Alan G.; Giampapa, Mark E; Heidelberger, Philip; Kopcsay, Gerard V.; Steinmacher-Burow, Burkhard D.; Takken, Todd E.
2008-10-28
A system and method for generating global asynchronous signals in a computing structure. Particularly, a global interrupt and barrier network is implemented that implements logic for generating global interrupt and barrier signals for controlling global asynchronous operations performed by processing elements at selected processing nodes of a computing structure in accordance with a processing algorithm; and includes the physical interconnecting of the processing nodes for communicating the global interrupt and barrier signals to the elements via low-latency paths. The global asynchronous signals respectively initiate interrupt and barrier operations at the processing nodes at times selected for optimizing performance of the processing algorithms. In one embodiment, the global interrupt and barrier network is implemented in a scalable, massively parallel supercomputing device structure comprising a plurality of processing nodes interconnected by multiple independent networks, with each node including one or more processing elements for performing computation or communication activity as required when performing parallel algorithm operations. One multiple independent network includes a global tree network for enabling high-speed global tree communications among global tree network nodes or sub-trees thereof. The global interrupt and barrier network may operate in parallel with the global tree network for providing global asynchronous sideband signals.
NASA Technical Reports Server (NTRS)
Kimnach, Greg L.; Lebron, Ramon C.; Fox, David A.
1999-01-01
The John H. Glenn Research Center at Lewis Field (GRC) in Cleveland, OH and the Sundstrand Corporation in Rockford, IL have designed and developed an Engineering Model (EM) Electrical Power Control Unit (EPCU) for the Fluids Combustion Facility, (FCF) experiments to be flown on the International Space Station (ISS). The EPCU will be used as the power interface to the ISS power distribution system for the FCF's space experiments'test and telemetry hardware. Furthermore. it is proposed to be the common power interface for all experiments. The EPCU is a three kilowatt 12OVdc-to-28Vdc converter utilizing three independent Power Converter Units (PCUs), each rated at 1kWe (36Adc @ 28Vdc) which are paralleled and synchronized. Each converter may be fed from one of two ISS power channels. The 28Vdc loads are connected to the EPCU output via 48 solid-state and current-limiting switches, rated at 4Adc each. These switches may be paralleled to supply any given load up to the 108Adc normal operational limit of the paralleled converters. The EPCU was designed in this manner to maximize allocated-power utilization. to shed loads autonomously, to provide fault tolerance. and to provide a flexible power converter and control module to meet various ISS load demands. Tests of the EPCU in the Power Systems Facility testbed at GRC reveal that the overall converted-power efficiency, is approximately 89% with a nominal-input voltage of 12OVdc and a total load in the range of 4O% to 110% rated 28Vdc load. (The PCUs alone have an efficiency of approximately 94.5%). Furthermore, the EM unit passed all flight-qualification level (and beyond) vibration tests, passed ISS EMI (conducted, radiated. and susceptibility) requirements. successfully operated for extended periods in a thermal/vacuum chamber, was integrated with a proto-flight experiment and passed all stability and functional requirements.
NASA Astrophysics Data System (ADS)
Kerr, P. C.; Donahue, A. S.; Westerink, J. J.; Luettich, R. A.; Zheng, L. Y.; Weisberg, R. H.; Huang, Y.; Wang, H. V.; Teng, Y.; Forrest, D. R.; Roland, A.; Haase, A. T.; Kramer, A. W.; Taylor, A. A.; Rhome, J. R.; Feyen, J. C.; Signell, R. P.; Hanson, J. L.; Hope, M. E.; Estes, R. M.; Dominguez, R. A.; Dunbar, R. P.; Semeraro, L. N.; Westerink, H. J.; Kennedy, A. B.; Smith, J. M.; Powell, M. D.; Cardone, V. J.; Cox, A. T.
2013-10-01
A Gulf of Mexico performance evaluation and comparison of coastal circulation and wave models was executed through harmonic analyses of tidal simulations, hindcasts of Hurricane Ike (2008) and Rita (2005), and a benchmarking study. Three unstructured coastal circulation models (ADCIRC, FVCOM, and SELFE) validated with similar skill on a new common Gulf scale mesh (ULLR) with identical frictional parameterization and forcing for the tidal validation and hurricane hindcasts. Coupled circulation and wave models, SWAN+ADCIRC and WWMII+SELFE, along with FVCOM loosely coupled with SWAN, also validated with similar skill. NOAA's official operational forecast storm surge model (SLOSH) was implemented on local and Gulf scale meshes with the same wind stress and pressure forcing used by the unstructured models for hindcasts of Ike and Rita. SLOSH's local meshes failed to capture regional processes such as Ike's forerunner and the results from the Gulf scale mesh further suggest shortcomings may be due to a combination of poor mesh resolution, missing internal physics such as tides and nonlinear advection, and SLOSH's internal frictional parameterization. In addition, these models were benchmarked to assess and compare execution speed and scalability for a prototypical operational simulation. It was apparent that a higher number of computational cores are needed for the unstructured models to meet similar operational implementation requirements to SLOSH, and that some of them could benefit from improved parallelization and faster execution speed.