Operation of high power converters in parallel
NASA Technical Reports Server (NTRS)
Decker, D. K.; Inouye, L. Y.
1993-01-01
High power converters that are used in space power subsystems are limited in power handling capability due to component and thermal limitations. For applications, such as Space Station Freedom, where multi-kilowatts of power must be delivered to user loads, parallel operation of converters becomes an attractive option when considering overall power subsystem topologies. TRW developed three different unequal power sharing approaches for parallel operation of converters. These approaches, known as droop, master-slave, and proportional adjustment, are discussed and test results are presented.
Cao, Jianfang; Chen, Lichao; Wang, Min; Tian, Yun
2018-01-01
The Canny operator is widely used to detect edges in images. However, as the size of the image dataset increases, the edge detection performance of the Canny operator decreases and its runtime becomes excessive. To improve the runtime and edge detection performance of the Canny operator, in this paper, we propose a parallel design and implementation for an Otsu-optimized Canny operator using a MapReduce parallel programming model that runs on the Hadoop platform. The Otsu algorithm is used to optimize the Canny operator's dual threshold and improve the edge detection performance, while the MapReduce parallel programming model facilitates parallel processing for the Canny operator to solve the processing speed and communication cost problems that occur when the Canny edge detection algorithm is applied to big data. For the experiments, we constructed datasets of different scales from the Pascal VOC2012 image database. The proposed parallel Otsu-Canny edge detection algorithm performs better than other traditional edge detection algorithms. The parallel approach reduced the running time by approximately 67.2% on a Hadoop cluster architecture consisting of 5 nodes with a dataset of 60,000 images. Overall, our approach system speeds up the system by approximately 3.4 times when processing large-scale datasets, which demonstrates the obvious superiority of our method. The proposed algorithm in this study demonstrates both better edge detection performance and improved time performance.
Wang, Min; Tian, Yun
2018-01-01
The Canny operator is widely used to detect edges in images. However, as the size of the image dataset increases, the edge detection performance of the Canny operator decreases and its runtime becomes excessive. To improve the runtime and edge detection performance of the Canny operator, in this paper, we propose a parallel design and implementation for an Otsu-optimized Canny operator using a MapReduce parallel programming model that runs on the Hadoop platform. The Otsu algorithm is used to optimize the Canny operator's dual threshold and improve the edge detection performance, while the MapReduce parallel programming model facilitates parallel processing for the Canny operator to solve the processing speed and communication cost problems that occur when the Canny edge detection algorithm is applied to big data. For the experiments, we constructed datasets of different scales from the Pascal VOC2012 image database. The proposed parallel Otsu-Canny edge detection algorithm performs better than other traditional edge detection algorithms. The parallel approach reduced the running time by approximately 67.2% on a Hadoop cluster architecture consisting of 5 nodes with a dataset of 60,000 images. Overall, our approach system speeds up the system by approximately 3.4 times when processing large-scale datasets, which demonstrates the obvious superiority of our method. The proposed algorithm in this study demonstrates both better edge detection performance and improved time performance. PMID:29861711
The Fight Deck Perspective of the NASA Langley AILS Concept
NASA Technical Reports Server (NTRS)
Rine, Laura L.; Abbott, Terence S.; Lohr, Gary W.; Elliott, Dawn M.; Waller, Marvin C.; Perry, R. Brad
2000-01-01
Many US airports depend on parallel runway operations to meet the growing demand for day to day operations. In the current airspace system, Instrument Meteorological Conditions (IMC) reduce the capacity of close parallel runway operations; that is, runways spaced closer than 4300 ft. These capacity losses can result in landing delays causing inconveniences to the traveling public, interruptions in commerce, and increased operating costs to the airlines. This document presents the flight deck perspective component of the Airborne Information for Lateral Spacing (AILS) approaches to close parallel runways in IMC. It represents the ideas the NASA Langley Research Center (LaRC) AILS Development Team envisions to integrate a number of components and procedures into a workable system for conducting close parallel runway approaches. An initial documentation of the aspects of this concept was sponsored by LaRC and completed in 1996. Since that time a number of the aspects have evolved to a more mature state. This paper is an update of the earlier documentation.
Directions in parallel programming: HPF, shared virtual memory and object parallelism in pC++
NASA Technical Reports Server (NTRS)
Bodin, Francois; Priol, Thierry; Mehrotra, Piyush; Gannon, Dennis
1994-01-01
Fortran and C++ are the dominant programming languages used in scientific computation. Consequently, extensions to these languages are the most popular for programming massively parallel computers. We discuss two such approaches to parallel Fortran and one approach to C++. The High Performance Fortran Forum has designed HPF with the intent of supporting data parallelism on Fortran 90 applications. HPF works by asking the user to help the compiler distribute and align the data structures with the distributed memory modules in the system. Fortran-S takes a different approach in which the data distribution is managed by the operating system and the user provides annotations to indicate parallel control regions. In the case of C++, we look at pC++ which is based on a concurrent aggregate parallel model.
Evolving binary classifiers through parallel computation of multiple fitness cases.
Cagnoni, Stefano; Bergenti, Federico; Mordonini, Monica; Adorni, Giovanni
2005-06-01
This paper describes two versions of a novel approach to developing binary classifiers, based on two evolutionary computation paradigms: cellular programming and genetic programming. Such an approach achieves high computation efficiency both during evolution and at runtime. Evolution speed is optimized by allowing multiple solutions to be computed in parallel. Runtime performance is optimized explicitly using parallel computation in the case of cellular programming or implicitly taking advantage of the intrinsic parallelism of bitwise operators on standard sequential architectures in the case of genetic programming. The approach was tested on a digit recognition problem and compared with a reference classifier.
Numerical Prediction of CCV in a PFI Engine using a Parallel LES Approach
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ameen, Muhsin M; Mirzaeian, Mohsen; Millo, Federico
Cycle-to-cycle variability (CCV) is detrimental to IC engine operation and can lead to partial burn, misfire, and knock. Predicting CCV numerically is extremely challenging due to two key reasons. Firstly, high-fidelity methods such as large eddy simulation (LES) are required to accurately resolve the incylinder turbulent flowfield both spatially and temporally. Secondly, CCV is experienced over long timescales and hence the simulations need to be performed for hundreds of consecutive cycles. Ameen et al. (Int. J. Eng. Res., 2017) developed a parallel perturbation model (PPM) approach to dissociate this long time-scale problem into several shorter timescale problems. The strategy ismore » to perform multiple single-cycle simulations in parallel by effectively perturbing the initial velocity field based on the intensity of the in-cylinder turbulence. This strategy was demonstrated for motored engine and it was shown that the mean and variance of the in-cylinder flowfield was captured reasonably well by this approach. In the present study, this PPM approach is extended to simulate the CCV in a fired port-fuel injected (PFI) SI engine. Two operating conditions are considered – a medium CCV operating case corresponding to 2500 rpm and 16 bar BMEP and a low CCV case corresponding to 4000 rpm and 12 bar BMEP. The predictions from this approach are also shown to be similar to the consecutive LES cycles. Both the consecutive and PPM LES cycles are observed to under-predict the variability in the early stage of combustion. The parallel approach slightly underpredicts the cyclic variability at all stages of combustion as compared to the consecutive LES cycles. However, it is shown that the parallel approach is able to predict the coefficient of variation (COV) of the in-cylinder pressure and burn rate related parameters with sufficient accuracy, and is also able to predict the qualitative trends in CCV with changing operating conditions. The convergence of the statistics predicted by the PPM approach with respect to the number of consecutive cycles required for each parallel simulation is also investigated. It is shown that this new approach is able to give accurate predictions of the CCV in fired engines in less than one-tenth of the time required for the conventional approach of simulating consecutive engine cycles.« less
An object-oriented approach to nested data parallelism
NASA Technical Reports Server (NTRS)
Sheffler, Thomas J.; Chatterjee, Siddhartha
1994-01-01
This paper describes an implementation technique for integrating nested data parallelism into an object-oriented language. Data-parallel programming employs sets of data called 'collections' and expresses parallelism as operations performed over the elements of a collection. When the elements of a collection are also collections, then there is the possibility for 'nested data parallelism.' Few current programming languages support nested data parallelism however. In an object-oriented framework, a collection is a single object. Its type defines the parallel operations that may be applied to it. Our goal is to design and build an object-oriented data-parallel programming environment supporting nested data parallelism. Our initial approach is built upon three fundamental additions to C++. We add new parallel base types by implementing them as classes, and add a new parallel collection type called a 'vector' that is implemented as a template. Only one new language feature is introduced: the 'foreach' construct, which is the basis for exploiting elementwise parallelism over collections. The strength of the method lies in the compilation strategy, which translates nested data-parallel C++ into ordinary C++. Extracting the potential parallelism in nested 'foreach' constructs is called 'flattening' nested parallelism. We show how to flatten 'foreach' constructs using a simple program transformation. Our prototype system produces vector code which has been successfully run on workstations, a CM-2, and a CM-5.
Fast, Massively Parallel Data Processors
NASA Technical Reports Server (NTRS)
Heaton, Robert A.; Blevins, Donald W.; Davis, ED
1994-01-01
Proposed fast, massively parallel data processor contains 8x16 array of processing elements with efficient interconnection scheme and options for flexible local control. Processing elements communicate with each other on "X" interconnection grid with external memory via high-capacity input/output bus. This approach to conditional operation nearly doubles speed of various arithmetic operations.
Cryogenic parallel, single phase flows: an analytical approach
NASA Astrophysics Data System (ADS)
Eichhorn, R.
2017-02-01
Managing the cryogenic flows inside a state-of-the-art accelerator cryomodule has become a demanding endeavour: In order to build highly efficient modules, all heat transfers are usually intercepted at various temperatures. For a multi-cavity module, operated at 1.8 K, this requires intercepts at 4 K and at 80 K at different locations with sometimes strongly varying heat loads which for simplicity reasons are operated in parallel. This contribution will describe an analytical approach, based on optimization theories.
Queueing Network Models for Parallel Processing of Task Systems: an Operational Approach
NASA Technical Reports Server (NTRS)
Mak, Victor W. K.
1986-01-01
Computer performance modeling of possibly complex computations running on highly concurrent systems is considered. Earlier works in this area either dealt with a very simple program structure or resulted in methods with exponential complexity. An efficient procedure is developed to compute the performance measures for series-parallel-reducible task systems using queueing network models. The procedure is based on the concept of hierarchical decomposition and a new operational approach. Numerical results for three test cases are presented and compared to those of simulations.
A Concept for Airborne Precision Spacing for Dependent Parallel Approaches
NASA Technical Reports Server (NTRS)
Barmore, Bryan E.; Baxley, Brian T.; Abbott, Terence S.; Capron, William R.; Smith, Colin L.; Shay, Richard F.; Hubbs, Clay
2012-01-01
The Airborne Precision Spacing concept of operations has been previously developed to support the precise delivery of aircraft landing successively on the same runway. The high-precision and consistent delivery of inter-aircraft spacing allows for increased runway throughput and the use of energy-efficient arrivals routes such as Continuous Descent Arrivals and Optimized Profile Descents. This paper describes an extension to the Airborne Precision Spacing concept to enable dependent parallel approach operations where the spacing aircraft must manage their in-trail spacing from a leading aircraft on approach to the same runway and spacing from an aircraft on approach to a parallel runway. Functionality for supporting automation is discussed as well as procedures for pilots and controllers. An analysis is performed to identify the required information and a new ADS-B report is proposed to support these information needs. Finally, several scenarios are described in detail.
Parallel dispatch: a new paradigm of electrical power system dispatch
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhang, Jun Jason; Wang, Fei-Yue; Wang, Qiang
Modern power systems are evolving into sociotechnical systems with massive complexity, whose real-time operation and dispatch go beyond human capability. Thus, the need for developing and applying new intelligent power system dispatch tools are of great practical significance. In this paper, we introduce the overall business model of power system dispatch, the top level design approach of an intelligent dispatch system, and the parallel intelligent technology with its dispatch applications. We expect that a new dispatch paradigm, namely the parallel dispatch, can be established by incorporating various intelligent technologies, especially the parallel intelligent technology, to enable secure operation of complexmore » power grids, extend system operators U+02BC capabilities, suggest optimal dispatch strategies, and to provide decision-making recommendations according to power system operational goals.« less
Simplified Aircraft-Based Paired Approach: Concept Definition and Initial Analysis
NASA Technical Reports Server (NTRS)
Johnson, Sally C.; Lohr, Gary W.; McKissick, Burnell T.; Abbott, Terence S.; Geurreiro, Nelson M.; Volk, Paul
2013-01-01
Simplified Aircraft-based Parallel Approach (SAPA) is an advanced concept proposed by the Federal Aviation Administration (FAA) to support dependent parallel approach operations to runways with lateral spacing closer than 2500 ft. At the request of the FAA, NASA performed an initial assessment of the potential performance and feasibility of the SAPA concept, including developing and assessing an operational implementation of the concept and conducting a Monte Carlo wake simulation study to examine the longitudinal spacing requirements. The SAPA concept was shown to have significant operational advantages in supporting the pairing of aircraft with dissimilar final approach speeds. The wake simulation study showed that support for dissimilar final approach speeds could be significantly enhanced through the use of a two-phased altitudebased longitudinal positioning requirement, with larger longitudinal positioning allowed for higher altitudes out of ground effect and tighter longitudinal positioning defined for altitudes near and in ground effect. While this assessment is preliminary and there are a number of operational issues still to be examined, it has shown the basic SAPA concept to be technically and operationally feasible.
Parallel computation with molecular-motor-propelled agents in nanofabricated networks.
Nicolau, Dan V; Lard, Mercy; Korten, Till; van Delft, Falco C M J M; Persson, Malin; Bengtsson, Elina; Månsson, Alf; Diez, Stefan; Linke, Heiner; Nicolau, Dan V
2016-03-08
The combinatorial nature of many important mathematical problems, including nondeterministic-polynomial-time (NP)-complete problems, places a severe limitation on the problem size that can be solved with conventional, sequentially operating electronic computers. There have been significant efforts in conceiving parallel-computation approaches in the past, for example: DNA computation, quantum computation, and microfluidics-based computation. However, these approaches have not proven, so far, to be scalable and practical from a fabrication and operational perspective. Here, we report the foundations of an alternative parallel-computation system in which a given combinatorial problem is encoded into a graphical, modular network that is embedded in a nanofabricated planar device. Exploring the network in a parallel fashion using a large number of independent, molecular-motor-propelled agents then solves the mathematical problem. This approach uses orders of magnitude less energy than conventional computers, thus addressing issues related to power consumption and heat dissipation. We provide a proof-of-concept demonstration of such a device by solving, in a parallel fashion, the small instance {2, 5, 9} of the subset sum problem, which is a benchmark NP-complete problem. Finally, we discuss the technical advances necessary to make our system scalable with presently available technology.
Simone Blair; Matt Campbell; Tom Lowe; Claire Campbell
2011-01-01
This paper explores the parallels that frequently exist in fire management organizations between operational approaches to fire and engagement approaches in the community. We observe that community issues are often treated in the same way as a fire incident—"controlled" and "contained" through education and "direct attack"...
Parallel Tensor Compression for Large-Scale Scientific Data.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kolda, Tamara G.; Ballard, Grey; Austin, Woody Nathan
As parallel computing trends towards the exascale, scientific data produced by high-fidelity simulations are growing increasingly massive. For instance, a simulation on a three-dimensional spatial grid with 512 points per dimension that tracks 64 variables per grid point for 128 time steps yields 8 TB of data. By viewing the data as a dense five way tensor, we can compute a Tucker decomposition to find inherent low-dimensional multilinear structure, achieving compression ratios of up to 10000 on real-world data sets with negligible loss in accuracy. So that we can operate on such massive data, we present the first-ever distributed memorymore » parallel implementation for the Tucker decomposition, whose key computations correspond to parallel linear algebra operations, albeit with nonstandard data layouts. Our approach specifies a data distribution for tensors that avoids any tensor data redistribution, either locally or in parallel. We provide accompanying analysis of the computation and communication costs of the algorithms. To demonstrate the compression and accuracy of the method, we apply our approach to real-world data sets from combustion science simulations. We also provide detailed performance results, including parallel performance in both weak and strong scaling experiments.« less
NASA Technical Reports Server (NTRS)
Waller, Marvin C.; Scanlon, Charles H.
1999-01-01
A number of our nations airports depend on closely spaced parallel runway operations to handle their normal traffic throughput when weather conditions are favorable. For safety these operations are curtailed in Instrument Meteorological Conditions (IMC) when the ceiling or visibility deteriorates and operations in many cases are limited to the equivalent of a single runway. Where parallel runway spacing is less than 2500 feet, capacity loss in IMC is on the order of 50 percent for these runways. Clearly, these capacity losses result in landing delays, inconveniences to the public, increased operational cost to the airlines, and general interruption of commerce. This document presents a description and the results of a fixed-base simulation study to evaluate an initial concept that includes a set of procedures for conducting safe flight in closely spaced parallel runway operations in IMC. Consideration of flight-deck information technology and displays to support the procedures is also included in the discussions. The procedures and supporting technology rely heavily on airborne capabilities operating in conjunction with the air traffic control system.
Parallel approach for bioinspired algorithms
NASA Astrophysics Data System (ADS)
Zaporozhets, Dmitry; Zaruba, Daria; Kulieva, Nina
2018-05-01
In the paper, a probabilistic parallel approach based on the population heuristic, such as a genetic algorithm, is suggested. The authors proposed using a multithreading approach at the micro level at which new alternative solutions are generated. On each iteration, several threads that independently used the same population to generate new solutions can be started. After the work of all threads, a selection operator combines obtained results in the new population. To confirm the effectiveness of the suggested approach, the authors have developed software on the basis of which experimental computations can be carried out. The authors have considered a classic optimization problem – finding a Hamiltonian cycle in a graph. Experiments show that due to the parallel approach at the micro level, increment of running speed can be obtained on graphs with 250 and more vertices.
An Analysis of the Role of ATC in the AILS Concept
NASA Technical Reports Server (NTRS)
Waller, Marvin C.; Doyle, Thomas M.; McGee, Frank G.
2000-01-01
Airborne information for lateral spacing (AILS) is a concept for making approaches to closely spaced parallel runways in instrument meteorological conditions (IMC). Under the concept, each equipped aircraft will assume responsibility for accurately managing its flight path along the approach course and maintaining separation from aircraft on the parallel approach. This document presents the results of an analysis of the AILS concept from an Air Traffic Control (ATC) perspective. The process has been examined in a step by step manner to determine ATC system support necessary to safely conduct closely spaced parallel approaches using the AILS concept. The analysis resulted in recognizing a number of issues related to integrating the process into the airspace system and proposes operating procedures.
Sharma, Richa; Amitava, Abadan K; Bani, Sadat AO
2014-01-01
Introduction: Minimal access surgery is common in all fields of medicine. We compared a new minimally invasive strabismus surgery (MISS) approach with a standard paralimbal strabismus surgery (SPSS) approach in terms of post-operative course. Materials and Methods: This parallel design study was done on 28 eyes of 14 patients, in which one eye was randomized to MISS and the other to SPSS. MISS was performed by giving two conjunctival incisions parallel to the horizontal rectus muscles; performing recession or resection below the conjunctival strip so obtained. We compared post-operative redness, congestion, chemosis, foreign body sensation (FBS), and drop intolerance (DI) on a graded scale of 0 to 3 on post-operative day 1, at 2-3 weeks, and 6 weeks. In addition, all scores were added to obtain a total inflammatory score (TIS). Statistical Analysis: Inflammatory scores were analyzed using Wilcoxon's signed rank test. Results: On the first post-operative day, only FBS (P =0.01) and TIS (P =0.04) showed significant difference favoring MISS. At 2-3 weeks, redness (P =0.04), congestion (P =0.04), FBS (P =0.02), and TIS (P =0.04) were significantly less in MISS eye. At 6 weeks, only redness (P =0.04) and TIS (P =0.05) were significantly less. Conclusion: MISS is more comfortable in the immediate post-operative period and provides better cosmesis in the intermediate period. PMID:24088635
Parallel computations and control of adaptive structures
NASA Technical Reports Server (NTRS)
Park, K. C.; Alvin, Kenneth F.; Belvin, W. Keith; Chong, K. P. (Editor); Liu, S. C. (Editor); Li, J. C. (Editor)
1991-01-01
The equations of motion for structures with adaptive elements for vibration control are presented for parallel computations to be used as a software package for real-time control of flexible space structures. A brief introduction of the state-of-the-art parallel computational capability is also presented. Time marching strategies are developed for an effective use of massive parallel mapping, partitioning, and the necessary arithmetic operations. An example is offered for the simulation of control-structure interaction on a parallel computer and the impact of the approach presented for applications in other disciplines than aerospace industry is assessed.
NASA Astrophysics Data System (ADS)
Wichert, Viktoria; Arkenberg, Mario; Hauschildt, Peter H.
2016-10-01
Highly resolved state-of-the-art 3D atmosphere simulations will remain computationally extremely expensive for years to come. In addition to the need for more computing power, rethinking coding practices is necessary. We take a dual approach by introducing especially adapted, parallel numerical methods and correspondingly parallelizing critical code passages. In the following, we present our respective work on PHOENIX/3D. With new parallel numerical algorithms, there is a big opportunity for improvement when iteratively solving the system of equations emerging from the operator splitting of the radiative transfer equation J = ΛS. The narrow-banded approximate Λ-operator Λ* , which is used in PHOENIX/3D, occurs in each iteration step. By implementing a numerical algorithm which takes advantage of its characteristic traits, the parallel code's efficiency is further increased and a speed-up in computational time can be achieved.
Operator assistant to support deep space network link monitor and control
NASA Technical Reports Server (NTRS)
Cooper, Lynne P.; Desai, Rajiv; Martinez, Elmain
1992-01-01
Preparing the Deep Space Network (DSN) stations to support spacecraft missions (referred to as pre-cal, for pre-calibration) is currently an operator and time intensive activity. Operators are responsible for sending and monitoring several hundred operator directivities, messages, and warnings. Operator directives are used to configure and calibrate the various subsystems (antenna, receiver, etc.) necessary to establish a spacecraft link. Messages and warnings are issued by the subsystems upon completion of an operation, changes of status, or an anomalous condition. Some points of pre-cal are logically parallel. Significant time savings could be realized if the existing Link Monitor and Control system (LMC) could support the operator in exploiting the parallelism inherent in pre-cal activities. Currently, operators may work on the individual subsystems in parallel, however, the burden of monitoring these parallel operations resides solely with the operator. Messages, warnings, and directives are all presented as they are received; without being correlated to the event that triggered them. Pre-cal is essentially an overhead activity. During pre-cal, no mission is supported, and no other activity can be performed using the equipment in the link. Therefore, it is highly desirable to reduce pre-cal time as much as possible. One approach to do this, as well as to increase efficiency and reduce errors, is the LMC Operator Assistant (OA). The LMC OA prototype demonstrates an architecture which can be used in concert with the existing LMC to exploit parallelism in pre-cal operations while providing the operators with a true monitoring capability, situational awareness and positive control. This paper presents an overview of the LMC OA architecture and the results from initial prototyping and test activities.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Castellana, Vito G.; Tumeo, Antonino; Ferrandi, Fabrizio
Emerging applications such as data mining, bioinformatics, knowledge discovery, social network analysis are irregular. They use data structures based on pointers or linked lists, such as graphs, unbalanced trees or unstructures grids, which generates unpredictable memory accesses. These data structures usually are large, but difficult to partition. These applications mostly are memory bandwidth bounded and have high synchronization intensity. However, they also have large amounts of inherent dynamic parallelism, because they potentially perform a task for each one of the element they are exploring. Several efforts are looking at accelerating these applications on hybrid architectures, which integrate general purpose processorsmore » with reconfigurable devices. Some solutions, which demonstrated significant speedups, include custom-hand tuned accelerators or even full processor architectures on the reconfigurable logic. In this paper we present an approach for the automatic synthesis of accelerators from C, targeted at irregular applications. In contrast to typical High Level Synthesis paradigms, which construct a centralized Finite State Machine, our approach generates dynamically scheduled hardware components. While parallelism exploitation in typical HLS-generated accelerators is usually bound within a single execution flow, our solution allows concurrently running multiple execution flow, thus also exploiting the coarser grain task parallelism of irregular applications. Our approach supports multiple, multi-ported and distributed memories, and atomic memory operations. Its main objective is parallelizing as many memory operations as possible, independently from their execution time, to maximize the memory bandwidth utilization. This significantly differs from current HLS flows, which usually consider a single memory port and require precise scheduling of memory operations. A key innovation of our approach is the generation of a memory interface controller, which dynamically maps concurrent memory accesses to multiple ports. We present a case study on a typical irregular kernel, Graph Breadth First search (BFS), exploring different tradeoffs in terms of parallelism and number of memories.« less
An Approach Using Parallel Architecture to Storage DICOM Images in Distributed File System
NASA Astrophysics Data System (ADS)
Soares, Tiago S.; Prado, Thiago C.; Dantas, M. A. R.; de Macedo, Douglas D. J.; Bauer, Michael A.
2012-02-01
Telemedicine is a very important area in medical field that is expanding daily motivated by many researchers interested in improving medical applications. In Brazil was started in 2005, in the State of Santa Catarina has a developed server called the CyclopsDCMServer, which the purpose to embrace the HDF for the manipulation of medical images (DICOM) using a distributed file system. Since then, many researches were initiated in order to seek better performance. Our approach for this server represents an additional parallel implementation in I/O operations since HDF version 5 has an essential feature for our work which supports parallel I/O, based upon the MPI paradigm. Early experiments using four parallel nodes, provide good performance when compare to the serial HDF implemented in the CyclopsDCMServer.
Parallel algorithms for large-scale biological sequence alignment on Xeon-Phi based clusters.
Lan, Haidong; Chan, Yuandong; Xu, Kai; Schmidt, Bertil; Peng, Shaoliang; Liu, Weiguo
2016-07-19
Computing alignments between two or more sequences are common operations frequently performed in computational molecular biology. The continuing growth of biological sequence databases establishes the need for their efficient parallel implementation on modern accelerators. This paper presents new approaches to high performance biological sequence database scanning with the Smith-Waterman algorithm and the first stage of progressive multiple sequence alignment based on the ClustalW heuristic on a Xeon Phi-based compute cluster. Our approach uses a three-level parallelization scheme to take full advantage of the compute power available on this type of architecture; i.e. cluster-level data parallelism, thread-level coarse-grained parallelism, and vector-level fine-grained parallelism. Furthermore, we re-organize the sequence datasets and use Xeon Phi shuffle operations to improve I/O efficiency. Evaluations show that our method achieves a peak overall performance up to 220 GCUPS for scanning real protein sequence databanks on a single node consisting of two Intel E5-2620 CPUs and two Intel Xeon Phi 7110P cards. It also exhibits good scalability in terms of sequence length and size, and number of compute nodes for both database scanning and multiple sequence alignment. Furthermore, the achieved performance is highly competitive in comparison to optimized Xeon Phi and GPU implementations. Our implementation is available at https://github.com/turbo0628/LSDBS-mpi .
DOE Office of Scientific and Technical Information (OSTI.GOV)
Vinogradov, A. Yu., E-mail: vinogradov-a@ntcees.ru; Gerasimov, A. S.; Kozlov, A. V.
Consideration is given to different approaches to modeling the control systems of gas turbines as a component of CCPP and GTPP to ensure their reliable parallel operation in the UPS of Russia. The disadvantages of the approaches to the modeling of combined-cycle units in studying long-term electromechanical transients accompanied by power imbalance are pointed out. Examples are presented to support the use of more detailed models of gas turbines in electromechanical transient calculations. It is shown that the modern speed control systems of gas turbines in combination with relatively low equivalent inertia have a considerable effect on electromechanical transients, includingmore » those caused by disturbances not related to power imbalance.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sreepathi, Sarat; Sripathi, Vamsi; Mills, Richard T
2013-01-01
Inefficient parallel I/O is known to be a major bottleneck among scientific applications employed on supercomputers as the number of processor cores grows into the thousands. Our prior experience indicated that parallel I/O libraries such as HDF5 that rely on MPI-IO do not scale well beyond 10K processor cores, especially on parallel file systems (like Lustre) with single point of resource contention. Our previous optimization efforts for a massively parallel multi-phase and multi-component subsurface simulator (PFLOTRAN) led to a two-phase I/O approach at the application level where a set of designated processes participate in the I/O process by splitting themore » I/O operation into a communication phase and a disk I/O phase. The designated I/O processes are created by splitting the MPI global communicator into multiple sub-communicators. The root process in each sub-communicator is responsible for performing the I/O operations for the entire group and then distributing the data to rest of the group. This approach resulted in over 25X speedup in HDF I/O read performance and 3X speedup in write performance for PFLOTRAN at over 100K processor cores on the ORNL Jaguar supercomputer. This research describes the design and development of a general purpose parallel I/O library, SCORPIO (SCalable block-ORiented Parallel I/O) that incorporates our optimized two-phase I/O approach. The library provides a simplified higher level abstraction to the user, sitting atop existing parallel I/O libraries (such as HDF5) and implements optimized I/O access patterns that can scale on larger number of processors. Performance results with standard benchmark problems and PFLOTRAN indicate that our library is able to maintain the same speedups as before with the added flexibility of being applicable to a wider range of I/O intensive applications.« less
NASA Technical Reports Server (NTRS)
Hooey, Becky Lee; Gore, Brian Francis; Mahlstedt, Eric; Foyle, David C.
2013-01-01
The objectives of the current research were to develop valid human performance models (HPMs) of approach and land operations; use these models to evaluate the impact of NextGen Closely Spaced Parallel Operations (CSPO) on pilot performance; and draw conclusions regarding flight deck display design and pilot-ATC roles and responsibilities for NextGen CSPO concepts. This document presents guidelines and implications for flight deck display designs and candidate roles and responsibilities. A companion document (Gore, Hooey, Mahlstedt, & Foyle, 2013) provides complete scenario descriptions and results including predictions of pilot workload, visual attention and time to detect off-nominal events.
Parallel Reconstruction Using Null Operations (PRUNO)
Zhang, Jian; Liu, Chunlei; Moseley, Michael E.
2011-01-01
A novel iterative k-space data-driven technique, namely Parallel Reconstruction Using Null Operations (PRUNO), is presented for parallel imaging reconstruction. In PRUNO, both data calibration and image reconstruction are formulated into linear algebra problems based on a generalized system model. An optimal data calibration strategy is demonstrated by using Singular Value Decomposition (SVD). And an iterative conjugate- gradient approach is proposed to efficiently solve missing k-space samples during reconstruction. With its generalized formulation and precise mathematical model, PRUNO reconstruction yields good accuracy, flexibility, stability. Both computer simulation and in vivo studies have shown that PRUNO produces much better reconstruction quality than autocalibrating partially parallel acquisition (GRAPPA), especially under high accelerating rates. With the aid of PRUO reconstruction, ultra high accelerating parallel imaging can be performed with decent image quality. For example, we have done successful PRUNO reconstruction at a reduction factor of 6 (effective factor of 4.44) with 8 coils and only a few autocalibration signal (ACS) lines. PMID:21604290
He, Fu-Liang; Wang, Lei; Yue, Zhen-Dong; Zhao, Hong-Wei; Liu, Fu-Quan
2014-09-07
To evaluate the feasibility of a second parallel transjugular intrahepatic portosystemic shunt (TIPS) to reduce portal venous pressure and control complications of portal hypertension. From January 2011 to December 2012, 10 cirrhotic patients were treated for complications of portal hypertension. The demographic data, operative data, postoperative recovery data, hemodynamic data, and complications were analyzed. Ten patients underwent a primary and parallel TIPS. Technical success rate was 100% with no technical complications. The mean duration of the first operation was 89.20 ± 29.46 min and the second operation was 57.0 ± 12.99 min. The mean portal system pressure decreased from 54.80 ± 4.16 mmHg to 39.0 ± 3.20 mmHg after the primary TIPS and from 44.40 ± 3.95 mmHg to 26.10 ± 4.07 mmHg after the parallel TIPS creation. The mean portosystemic pressure gradient decreased from 43.80 ± 6.18 mmHg to 31.90 ± 2.85 mmHg after the primary TIPS and from 35.60 ± 2.72 mmHg to 15.30 ± 3.27 mmHg after the parallel TIPS creation. Clinical improvement was seen in all patients after the parallel TIPS creation. One patient suffered from transient grade I hepatic encephalopathy (HE) after the primary TIPS and four patients experienced transient grade I-II after the parallel TIPS procedure. Mean hospital stay after the first and second operations were 15.0 ± 3.71 d and 16.90 ± 5.11 d (P = 0.014), respectively. After a mean 14.0 ± 3.13 mo follow-up, ascites and bleeding were well controlled and no stenosis of the stents was found. Parallel TIPS is an effective approach for controlling portal hypertension complications.
NASA Astrophysics Data System (ADS)
Kim, Jae Wook
2013-05-01
This paper proposes a novel systematic approach for the parallelization of pentadiagonal compact finite-difference schemes and filters based on domain decomposition. The proposed approach allows a pentadiagonal banded matrix system to be split into quasi-disjoint subsystems by using a linear-algebraic transformation technique. As a result the inversion of pentadiagonal matrices can be implemented within each subdomain in an independent manner subject to a conventional halo-exchange process. The proposed matrix transformation leads to new subdomain boundary (SB) compact schemes and filters that require three halo terms to exchange with neighboring subdomains. The internode communication overhead in the present approach is equivalent to that of standard explicit schemes and filters based on seven-point discretization stencils. The new SB compact schemes and filters demand additional arithmetic operations compared to the original serial ones. However, it is shown that the additional cost becomes sufficiently low by choosing optimal sizes of their discretization stencils. Compared to earlier published results, the proposed SB compact schemes and filters successfully reduce parallelization artifacts arising from subdomain boundaries to a level sufficiently negligible for sophisticated aeroacoustic simulations without degrading parallel efficiency. The overall performance and parallel efficiency of the proposed approach are demonstrated by stringent benchmark tests.
Airborne Precision Spacing (APS) Dependent Parallel Arrivals (DPA)
NASA Technical Reports Server (NTRS)
Smith, Colin L.
2012-01-01
The Airborne Precision Spacing (APS) team at the NASA Langley Research Center (LaRC) has been developing a concept of operations to extend the current APS concept to support dependent approaches to parallel or converging runways along with the required pilot and controller procedures and pilot interfaces. A staggered operations capability for the Airborne Spacing for Terminal Arrival Routes (ASTAR) tool was developed and designated as ASTAR10. ASTAR10 has reached a sufficient level of maturity to be validated and tested through a fast-time simulation. The purpose of the experiment was to identify and resolve any remaining issues in the ASTAR10 algorithm, as well as put the concept of operations through a practical test.
Hierarchical analytical and simulation modelling of human-machine systems with interference
NASA Astrophysics Data System (ADS)
Braginsky, M. Ya; Tarakanov, D. V.; Tsapko, S. G.; Tsapko, I. V.; Baglaeva, E. A.
2017-01-01
The article considers the principles of building the analytical and simulation model of the human operator and the industrial control system hardware and software. E-networks as the extension of Petri nets are used as the mathematical apparatus. This approach allows simulating complex parallel distributed processes in human-machine systems. The structural and hierarchical approach is used as the building method for the mathematical model of the human operator. The upper level of the human operator is represented by the logical dynamic model of decision making based on E-networks. The lower level reflects psychophysiological characteristics of the human-operator.
Execution of parallel algorithms on a heterogeneous multicomputer
NASA Astrophysics Data System (ADS)
Isenstein, Barry S.; Greene, Jonathon
1995-04-01
Many aerospace/defense sensing and dual-use applications require high-performance computing, extensive high-bandwidth interconnect and realtime deterministic operation. This paper will describe the architecture of a scalable multicomputer that includes DSP and RISC processors. A single chassis implementation is capable of delivering in excess of 10 GFLOPS of DSP processing power with 2 Gbytes/s of realtime sensor I/O. A software approach to implementing parallel algorithms called the Parallel Application System (PAS) is also presented. An example of applying PAS to a DSP application is shown.
Pernar, Luise I M; Ashley, Stanley W; Smink, Douglas S; Zinner, Michael J; Peyre, Sarah E
2012-01-01
Practicing within the Halstedian model of surgical education, academic surgeons serve dual roles as physicians to their patients and educators of their trainees. Despite this significant responsibility, few surgeons receive formal training in educational theory to inform their practice. The goal of this work was to gain an understanding of how master surgeons approach teaching uncommon and highly complex operations and to determine the educational constructs that frame their teaching philosophies and approaches. Individuals included in the study were queried using electronically distributed open-ended, structured surveys. Responses to the surveys were analyzed and grouped using grounded theory and were examined for parallels to concepts of learning theory. Academic teaching hospital. Twenty-two individuals identified as master surgeons. Twenty-one (95.5%) individuals responded to the survey. Two primary thematic clusters were identified: global approach to teaching (90.5% of respondents) and approach to intraoperative teaching (76.2%). Many of the emergent themes paralleled principles of transfer learning theory outlined in the psychology and education literature. Key elements included: conferring graduated responsibility (57.1%), encouraging development of a mental set (47.6%), fostering or expecting deliberate practice (42.9%), deconstructing complex tasks (38.1%), vertical transfer of information (33.3%), and identifying general principles to structure knowledge (9.5%). Master surgeons employ many of the principles of learning theory when teaching uncommon and highly complex operations. The findings may hold significant implications for faculty development in surgical education. Copyright © 2012 Association of Program Directors in Surgery. Published by Elsevier Inc. All rights reserved.
Multitasking the three-dimensional transport code TORT on CRAY platforms
DOE Office of Scientific and Technical Information (OSTI.GOV)
Azmy, Y.Y.; Barnett, D.A.; Burre, C.A.
1996-04-01
The multitasking options in the three-dimensional neutral particle transport code TORT originally implemented for Cray`s CTSS operating system are revived and extended to run on Cray Y/MP and C90 computers using the UNICOS operating system. These include two coarse-grained domain decompositions; across octants, and across directions within an octant, termed Octant Parallel (OP), and Direction Parallel (DP), respectively. Parallel performance of the DP is significantly enhanced by increasing the task grain size and reducing load imbalance via dynamic scheduling of the discrete angles among the participating tasks. Substantial Wall Clock speedup factors, approaching 4.5 using 8 tasks, have been measuredmore » in a time-sharing environment, and generally depend on the test problem specifications, number of tasks, and machine loading during execution.« less
Hierarchical fractional-step approximations and parallel kinetic Monte Carlo algorithms
DOE Office of Scientific and Technical Information (OSTI.GOV)
Arampatzis, Giorgos, E-mail: garab@math.uoc.gr; Katsoulakis, Markos A., E-mail: markos@math.umass.edu; Plechac, Petr, E-mail: plechac@math.udel.edu
2012-10-01
We present a mathematical framework for constructing and analyzing parallel algorithms for lattice kinetic Monte Carlo (KMC) simulations. The resulting algorithms have the capacity to simulate a wide range of spatio-temporal scales in spatially distributed, non-equilibrium physiochemical processes with complex chemistry and transport micro-mechanisms. Rather than focusing on constructing exactly the stochastic trajectories, our approach relies on approximating the evolution of observables, such as density, coverage, correlations and so on. More specifically, we develop a spatial domain decomposition of the Markov operator (generator) that describes the evolution of all observables according to the kinetic Monte Carlo algorithm. This domain decompositionmore » corresponds to a decomposition of the Markov generator into a hierarchy of operators and can be tailored to specific hierarchical parallel architectures such as multi-core processors or clusters of Graphical Processing Units (GPUs). Based on this operator decomposition, we formulate parallel Fractional step kinetic Monte Carlo algorithms by employing the Trotter Theorem and its randomized variants; these schemes, (a) are partially asynchronous on each fractional step time-window, and (b) are characterized by their communication schedule between processors. The proposed mathematical framework allows us to rigorously justify the numerical and statistical consistency of the proposed algorithms, showing the convergence of our approximating schemes to the original serial KMC. The approach also provides a systematic evaluation of different processor communicating schedules. We carry out a detailed benchmarking of the parallel KMC schemes using available exact solutions, for example, in Ising-type systems and we demonstrate the capabilities of the method to simulate complex spatially distributed reactions at very large scales on GPUs. Finally, we discuss work load balancing between processors and propose a re-balancing scheme based on probabilistic mass transport methods.« less
Parallelized direct execution simulation of message-passing parallel programs
NASA Technical Reports Server (NTRS)
Dickens, Phillip M.; Heidelberger, Philip; Nicol, David M.
1994-01-01
As massively parallel computers proliferate, there is growing interest in findings ways by which performance of massively parallel codes can be efficiently predicted. This problem arises in diverse contexts such as parallelizing computers, parallel performance monitoring, and parallel algorithm development. In this paper we describe one solution where one directly executes the application code, but uses a discrete-event simulator to model details of the presumed parallel machine such as operating system and communication network behavior. Because this approach is computationally expensive, we are interested in its own parallelization specifically the parallelization of the discrete-event simulator. We describe methods suitable for parallelized direct execution simulation of message-passing parallel programs, and report on the performance of such a system, Large Application Parallel Simulation Environment (LAPSE), we have built on the Intel Paragon. On all codes measured to date, LAPSE predicts performance well typically within 10 percent relative error. Depending on the nature of the application code, we have observed low slowdowns (relative to natively executing code) and high relative speedups using up to 64 processors.
Comment on Gallistel: behavior theory and information theory: some parallels.
Nevin, John A
2012-05-01
In this article, Gallistel proposes information theory as an approach to some enduring problems in the study of operant and classical conditioning. Copyright © 2012 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Jiang, Xikai; Li, Jiyuan; Zhao, Xujun; Qin, Jian; Karpeev, Dmitry; Hernandez-Ortiz, Juan; de Pablo, Juan J.; Heinonen, Olle
2016-08-01
Large classes of materials systems in physics and engineering are governed by magnetic and electrostatic interactions. Continuum or mesoscale descriptions of such systems can be cast in terms of integral equations, whose direct computational evaluation requires O(N2) operations, where N is the number of unknowns. Such a scaling, which arises from the many-body nature of the relevant Green's function, has precluded wide-spread adoption of integral methods for solution of large-scale scientific and engineering problems. In this work, a parallel computational approach is presented that relies on using scalable open source libraries and utilizes a kernel-independent Fast Multipole Method (FMM) to evaluate the integrals in O(N) operations, with O(N) memory cost, thereby substantially improving the scalability and efficiency of computational integral methods. We demonstrate the accuracy, efficiency, and scalability of our approach in the context of two examples. In the first, we solve a boundary value problem for a ferroelectric/ferromagnetic volume in free space. In the second, we solve an electrostatic problem involving polarizable dielectric bodies in an unbounded dielectric medium. The results from these test cases show that our proposed parallel approach, which is built on a kernel-independent FMM, can enable highly efficient and accurate simulations and allow for considerable flexibility in a broad range of applications.
Jiang, Xikai; Li, Jiyuan; Zhao, Xujun; ...
2016-08-10
Large classes of materials systems in physics and engineering are governed by magnetic and electrostatic interactions. Continuum or mesoscale descriptions of such systems can be cast in terms of integral equations, whose direct computational evaluation requires O( N 2) operations, where N is the number of unknowns. Such a scaling, which arises from the many-body nature of the relevant Green's function, has precluded wide-spread adoption of integral methods for solution of large-scale scientific and engineering problems. In this work, a parallel computational approach is presented that relies on using scalable open source libraries and utilizes a kernel-independent Fast Multipole Methodmore » (FMM) to evaluate the integrals in O( N) operations, with O( N) memory cost, thereby substantially improving the scalability and efficiency of computational integral methods. We demonstrate the accuracy, efficiency, and scalability of our approach in the context of two examples. In the first, we solve a boundary value problem for a ferroelectric/ferromagnetic volume in free space. In the second, we solve an electrostatic problem involving polarizable dielectric bodies in an unbounded dielectric medium. Lastly, the results from these test cases show that our proposed parallel approach, which is built on a kernel-independent FMM, can enable highly efficient and accurate simulations and allow for considerable flexibility in a broad range of applications.« less
Role of the Controller in an Integrated Pilot-Controller Study for Parallel Approaches
NASA Technical Reports Server (NTRS)
Verma, Savvy; Kozon, Thomas; Ballinger, Debbi; Lozito, Sandra; Subramanian, Shobana
2011-01-01
Closely spaced parallel runway operations have been found to increase capacity within the National Airspace System but poor visibility conditions reduce the use of these operations [1]. Previous research examined the concepts and procedures related to parallel runways [2][4][5]. However, there has been no investigation of the procedures associated with the strategic and tactical pairing of aircraft for these operations. This study developed and examined the pilot s and controller s procedures and information requirements for creating aircraft pairs for closely spaced parallel runway operations. The goal was to achieve aircraft pairing with a temporal separation of 15s (+/- 10s error) at a coupling point that was 12 nmi from the runway threshold. In this paper, the role of the controller, as examined in an integrated study of controllers and pilots, is presented. The controllers utilized a pairing scheduler and new pairing interfaces to help create and maintain aircraft pairs, in a high-fidelity, human-in-the loop simulation experiment. Results show that the controllers worked as a team to achieve pairing between aircraft and the level of inter-controller coordination increased when the aircraft in the pair belonged to different sectors. Controller feedback did not reveal over reliance on the automation nor complacency with the pairing automation or pairing procedures.
Perturbation Experiments: Approaches for Metabolic Pathway Analysis in Bioreactors.
Weiner, Michael; Tröndle, Julia; Albermann, Christoph; Sprenger, Georg A; Weuster-Botz, Dirk
2016-01-01
In the last decades, targeted metabolic engineering of microbial cells has become one of the major tools in bioprocess design and optimization. For successful application, a detailed knowledge is necessary about the relevant metabolic pathways and their regulation inside the cells. Since in vitro experiments cannot display process conditions and behavior properly, process data about the cells' metabolic state have to be collected in vivo. For this purpose, special techniques and methods are necessary. Therefore, most techniques enabling in vivo characterization of metabolic pathways rely on perturbation experiments, which can be divided into dynamic and steady-state approaches. To avoid any process disturbance, approaches which enable perturbation of cell metabolism in parallel to the continuing production process are reasonable. Furthermore, the fast dynamics of microbial production processes amplifies the need of parallelized data generation. These points motivate the development of a parallelized approach for multiple metabolic perturbation experiments outside the operating production reactor. An appropriate approach for in vivo characterization of metabolic pathways is presented and applied exemplarily to a microbial L-phenylalanine production process on a 15 L-scale.
Parallel Computation of the Jacobian Matrix for Nonlinear Equation Solvers Using MATLAB
NASA Technical Reports Server (NTRS)
Rose, Geoffrey K.; Nguyen, Duc T.; Newman, Brett A.
2017-01-01
Demonstrating speedup for parallel code on a multicore shared memory PC can be challenging in MATLAB due to underlying parallel operations that are often opaque to the user. This can limit potential for improvement of serial code even for the so-called embarrassingly parallel applications. One such application is the computation of the Jacobian matrix inherent to most nonlinear equation solvers. Computation of this matrix represents the primary bottleneck in nonlinear solver speed such that commercial finite element (FE) and multi-body-dynamic (MBD) codes attempt to minimize computations. A timing study using MATLAB's Parallel Computing Toolbox was performed for numerical computation of the Jacobian. Several approaches for implementing parallel code were investigated while only the single program multiple data (spmd) method using composite objects provided positive results. Parallel code speedup is demonstrated but the goal of linear speedup through the addition of processors was not achieved due to PC architecture.
Runtime verification of embedded real-time systems.
Reinbacher, Thomas; Függer, Matthias; Brauer, Jörg
We present a runtime verification framework that allows on-line monitoring of past-time Metric Temporal Logic (ptMTL) specifications in a discrete time setting. We design observer algorithms for the time-bounded modalities of ptMTL, which take advantage of the highly parallel nature of hardware designs. The algorithms can be translated into efficient hardware blocks, which are designed for reconfigurability, thus, facilitate applications of the framework in both a prototyping and a post-deployment phase of embedded real-time systems. We provide formal correctness proofs for all presented observer algorithms and analyze their time and space complexity. For example, for the most general operator considered, the time-bounded Since operator, we obtain a time complexity that is doubly logarithmic both in the point in time the operator is executed and the operator's time bounds. This result is promising with respect to a self-contained, non-interfering monitoring approach that evaluates real-time specifications in parallel to the system-under-test. We implement our framework on a Field Programmable Gate Array platform and use extensive simulation and logic synthesis runs to assess the benefits of the approach in terms of resource usage and operating frequency.
Enabling CSPA Operations Through Pilot Involvement in Longitudinal Approach Spacing
NASA Technical Reports Server (NTRS)
Battiste, Vernol (Technical Monitor); Pritchett, Amy
2003-01-01
Several major airports around the United States have, or plan to have, closely-spaced parallel runways. This project complemented current and previous research by examining the pilots ability to control their position longitudinally within their approach stream.This project s results considered spacing for separation from potential positions of wake vortices from the parallel approach. This preventive function could enable CSPA operations to very closely spaced runways. This work also considered how pilot involvement in longitudinal spacing could allow for more efficient traffic flow, by allowing pilots to keep their aircraft within tighter arrival slots then air traffic control (ATC) might be able to establish, and by maintaining space within the arrival stream for corresponding departure slots. To this end, this project conducted several research studies providing an analytic and computational basis for calculating appropriate aircraft spacings, experimental results from a piloted flight simulator test, and an experimental testbed for future simulator tests. The following sections summarize the results of these three efforts.
DOT National Transportation Integrated Search
2014-02-01
The purpose of this memorandum is to provide recommended Total System Error (TSE) models : for aircraft using RNAV (GPS) guidance when analyzing the wake encounter risk of proposed : simultaneous dependent (paired) approach operations to Closel...
NASA Technical Reports Server (NTRS)
Barnes, George H. (Inventor); Lundstrom, Stephen F. (Inventor); Shafer, Philip E. (Inventor)
1983-01-01
A high speed parallel array data processing architecture fashioned under a computational envelope approach includes a data base memory for secondary storage of programs and data, and a plurality of memory modules interconnected to a plurality of processing modules by a connection network of the Omega gender. Programs and data are fed from the data base memory to the plurality of memory modules and from hence the programs are fed through the connection network to the array of processors (one copy of each program for each processor). Execution of the programs occur with the processors operating normally quite independently of each other in a multiprocessing fashion. For data dependent operations and other suitable operations, all processors are instructed to finish one given task or program branch before all are instructed to proceed in parallel processing fashion on the next instruction. Even when functioning in the parallel processing mode however, the processors are not locked-step but execute their own copy of the program individually unless or until another overall processor array synchronization instruction is issued.
Generalized kinetic-neoclassical closure for parallel viscosity in a tokamak.
NASA Astrophysics Data System (ADS)
Smolyakov, A.; Callen, J. D.; Hegna, C.
2000-10-01
We develop a drift-kinetic equation for a Chapman Enskog-type calculations of the parallel viscosity in a tokamak. This approach allows us to uniformly obtain closure relations for the parallel viscosity that include the kinetic effects of wave-particle interactions, such as those of Hammet-Perkins closures, as well as standard neoclassical moment closures induced by collisions and the magnetic field strength variation along field lines. Closures for both these cases can be obtained from our expressions; also, their mutual influences can be investigated. The developed equations allow calculation of parallel vicosity in general kinetic-neoclassical regimes while the main conservation properties remain correct even with an approximate treatment of the collisional operator.
Composing Data Parallel Code for a SPARQL Graph Engine
DOE Office of Scientific and Technical Information (OSTI.GOV)
Castellana, Vito G.; Tumeo, Antonino; Villa, Oreste
Big data analytics process large amount of data to extract knowledge from them. Semantic databases are big data applications that adopt the Resource Description Framework (RDF) to structure metadata through a graph-based representation. The graph based representation provides several benefits, such as the possibility to perform in memory processing with large amounts of parallelism. SPARQL is a language used to perform queries on RDF-structured data through graph matching. In this paper we present a tool that automatically translates SPARQL queries to parallel graph crawling and graph matching operations. The tool also supports complex SPARQL constructs, which requires more than basicmore » graph matching for their implementation. The tool generates parallel code annotated with OpenMP pragmas for x86 Shared-memory Multiprocessors (SMPs). With respect to commercial database systems such as Virtuoso, our approach reduces memory occupation due to join operations and provides higher performance. We show the scaling of the automatically generated graph-matching code on a 48-core SMP.« less
Solving Partial Differential Equations in a data-driven multiprocessor environment
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gaudiot, J.L.; Lin, C.M.; Hosseiniyar, M.
1988-12-31
Partial differential equations can be found in a host of engineering and scientific problems. The emergence of new parallel architectures has spurred research in the definition of parallel PDE solvers. Concurrently, highly programmable systems such as data-how architectures have been proposed for the exploitation of large scale parallelism. The implementation of some Partial Differential Equation solvers (such as the Jacobi method) on a tagged token data-flow graph is demonstrated here. Asynchronous methods (chaotic relaxation) are studied and new scheduling approaches (the Token No-Labeling scheme) are introduced in order to support the implementation of the asychronous methods in a data-driven environment.more » New high-level data-flow language program constructs are introduced in order to handle chaotic operations. Finally, the performance of the program graphs is demonstrated by a deterministic simulation of a message passing data-flow multiprocessor. An analysis of the overhead in the data-flow graphs is undertaken to demonstrate the limits of parallel operations in dataflow PDE program graphs.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chacon, Luis; del-Castillo-Negrete, Diego; Hauck, Cory D.
2014-09-01
We propose a Lagrangian numerical algorithm for a time-dependent, anisotropic temperature transport equation in magnetized plasmas in the large guide field regime. The approach is based on an analytical integral formal solution of the parallel (i.e., along the magnetic field) transport equation with sources, and it is able to accommodate both local and non-local parallel heat flux closures. The numerical implementation is based on an operator-split formulation, with two straightforward steps: a perpendicular transport step (including sources), and a Lagrangian (field-line integral) parallel transport step. Algorithmically, the first step is amenable to the use of modern iterative methods, while themore » second step has a fixed cost per degree of freedom (and is therefore scalable). Accuracy-wise, the approach is free from the numerical pollution introduced by the discrete parallel transport term when the perpendicular to parallel transport coefficient ratio X ⊥ /X ∥ becomes arbitrarily small, and is shown to capture the correct limiting solution when ε = X⊥L 2 ∥/X1L 2 ⊥ → 0 (with L∥∙ L⊥ , the parallel and perpendicular diffusion length scales, respectively). Therefore, the approach is asymptotic-preserving. We demonstrate the capabilities of the scheme with several numerical experiments with varying magnetic field complexity in two dimensions, including the case of transport across a magnetic island.« less
Xue, Huan; Hu, Yuantai; Wang, Qing-Ming
2008-09-01
This paper presents a novel approach for designing broadband piezoelectric harvesters by integrating multiple piezoelectric bimorphs (PBs) with different aspect ratios into a system. The effect of 2 connecting patterns among PBs, in series and in parallel, on improving energy harvesting performance is discussed. It is found for multifrequency spectra ambient vibrations: 1) the operating frequency band (OFB) of a harvesting structure can be widened by connecting multiple PBs with different aspect ratios in series; 2) the OFB of a harvesting structure can be shifted to the dominant frequency domain of the ambient vibrations by increasing or decreasing the number of PBs in parallel. Numerical results show that the OFB of the piezoelectric energy harvesting devices can be tailored by the connection patterns (i.e., in series and in parallel) among PBs.
Parallel Anisotropic Tetrahedral Adaptation
NASA Technical Reports Server (NTRS)
Park, Michael A.; Darmofal, David L.
2008-01-01
An adaptive method that robustly produces high aspect ratio tetrahedra to a general 3D metric specification without introducing hybrid semi-structured regions is presented. The elemental operators and higher-level logic is described with their respective domain-decomposed parallelizations. An anisotropic tetrahedral grid adaptation scheme is demonstrated for 1000-1 stretching for a simple cube geometry. This form of adaptation is applicable to more complex domain boundaries via a cut-cell approach as demonstrated by a parallel 3D supersonic simulation of a complex fighter aircraft. To avoid the assumptions and approximations required to form a metric to specify adaptation, an approach is introduced that directly evaluates interpolation error. The grid is adapted to reduce and equidistribute this interpolation error calculation without the use of an intervening anisotropic metric. Direct interpolation error adaptation is illustrated for 1D and 3D domains.
NASA Astrophysics Data System (ADS)
Bellerby, Tim
2014-05-01
Model Integration System (MIST) is open-source environmental modelling programming language that directly incorporates data parallelism. The language is designed to enable straightforward programming structures, such as nested loops and conditional statements to be directly translated into sequences of whole-array (or more generally whole data-structure) operations. MIST thus enables the programmer to use well-understood constructs, directly relating to the mathematical structure of the model, without having to explicitly vectorize code or worry about details of parallelization. A range of common modelling operations are supported by dedicated language structures operating on cell neighbourhoods rather than individual cells (e.g.: the 3x3 local neighbourhood needed to implement an averaging image filter can be simply accessed from within a simple loop traversing all image pixels). This facility hides details of inter-process communication behind more mathematically relevant descriptions of model dynamics. The MIST automatic vectorization/parallelization process serves both to distribute work among available nodes and separately to control storage requirements for intermediate expressions - enabling operations on very large domains for which memory availability may be an issue. MIST is designed to facilitate efficient interpreter based implementations. A prototype open source interpreter is available, coded in standard FORTRAN 95, with tools to rapidly integrate existing FORTRAN 77 or 95 code libraries. The language is formally specified and thus not limited to FORTRAN implementation or to an interpreter-based approach. A MIST to FORTRAN compiler is under development and volunteers are sought to create an ANSI-C implementation. Parallel processing is currently implemented using OpenMP. However, parallelization code is fully modularised and could be replaced with implementations using other libraries. GPU implementation is potentially possible.
High order parallel numerical schemes for solving incompressible flows
NASA Technical Reports Server (NTRS)
Lin, Avi; Milner, Edward J.; Liou, May-Fun; Belch, Richard A.
1992-01-01
The use of parallel computers for numerically solving flow fields has gained much importance in recent years. This paper introduces a new high order numerical scheme for computational fluid dynamics (CFD) specifically designed for parallel computational environments. A distributed MIMD system gives the flexibility of treating different elements of the governing equations with totally different numerical schemes in different regions of the flow field. The parallel decomposition of the governing operator to be solved is the primary parallel split. The primary parallel split was studied using a hypercube like architecture having clusters of shared memory processors at each node. The approach is demonstrated using examples of simple steady state incompressible flows. Future studies should investigate the secondary split because, depending on the numerical scheme that each of the processors applies and the nature of the flow in the specific subdomain, it may be possible for a processor to seek better, or higher order, schemes for its particular subcase.
GPU-Based Point Cloud Superpositioning for Structural Comparisons of Protein Binding Sites.
Leinweber, Matthias; Fober, Thomas; Freisleben, Bernd
2018-01-01
In this paper, we present a novel approach to solve the labeled point cloud superpositioning problem for performing structural comparisons of protein binding sites. The solution is based on a parallel evolution strategy that operates on large populations and runs on GPU hardware. The proposed evolution strategy reduces the likelihood of getting stuck in a local optimum of the multimodal real-valued optimization problem represented by labeled point cloud superpositioning. The performance of the GPU-based parallel evolution strategy is compared to a previously proposed CPU-based sequential approach for labeled point cloud superpositioning, indicating that the GPU-based parallel evolution strategy leads to qualitatively better results and significantly shorter runtimes, with speed improvements of up to a factor of 1,500 for large populations. Binary classification tests based on the ATP, NADH, and FAD protein subsets of CavBase, a database containing putative binding sites, show average classification rate improvements from about 92 percent (CPU) to 96 percent (GPU). Further experiments indicate that the proposed GPU-based labeled point cloud superpositioning approach can be superior to traditional protein comparison approaches based on sequence alignments.
Wake vortex capacity benefits for simultaneous approaches at St. Louis Airport
DOT National Transportation Integrated Search
1994-06-27
This paper details the results of FTA's investigation into the potential capacity gains of applying 1.5 nautical mile (NM) diagonal separation between parallel arrival operations at St. Louis Lambert International Airport (STL). Currently, dependent ...
Simplified Parallel Domain Traversal
DOE Office of Scientific and Technical Information (OSTI.GOV)
Erickson III, David J
2011-01-01
Many data-intensive scientific analysis techniques require global domain traversal, which over the years has been a bottleneck for efficient parallelization across distributed-memory architectures. Inspired by MapReduce and other simplified parallel programming approaches, we have designed DStep, a flexible system that greatly simplifies efficient parallelization of domain traversal techniques at scale. In order to deliver both simplicity to users as well as scalability on HPC platforms, we introduce a novel two-tiered communication architecture for managing and exploiting asynchronous communication loads. We also integrate our design with advanced parallel I/O techniques that operate directly on native simulation output. We demonstrate DStep bymore » performing teleconnection analysis across ensemble runs of terascale atmospheric CO{sub 2} and climate data, and we show scalability results on up to 65,536 IBM BlueGene/P cores.« less
A novel visual hardware behavioral language
NASA Technical Reports Server (NTRS)
Li, Xueqin; Cheng, H. D.
1992-01-01
Most hardware behavioral languages just use texts to describe the behavior of the desired hardware design. This is inconvenient for VLSI designers who enjoy using the schematic approach. The proposed visual hardware behavioral language has the ability to graphically express design information using visual parallel models (blocks), visual sequential models (processes) and visual data flow graphs (which consist of primitive operational icons, control icons, and Data and Synchro links). Thus, the proposed visual hardware behavioral language can not only specify hardware concurrent and sequential functionality, but can also visually expose parallelism, sequentiality, and disjointness (mutually exclusive operations) for the hardware designers. That would make the hardware designers capture the design ideas easily and explicitly using this visual hardware behavioral language.
On the Use of Kronecker Operators for the Solution of Generalized Stochastic Petri Nets
NASA Technical Reports Server (NTRS)
Ciardo, Gianfranco; Tilgner, Marco
1996-01-01
We discuss how to describe the Markov chain underlying a generalized stochastic Petri net using Kronecker operators on smaller matrices. We extend previous approaches by allowing both an extensive type of marking-dependent behavior for the transitions and the presence of immediate synchronizations. The derivation of the results is thoroughly formalized, including the use of Kronecker operators in the treatment of the vanishing markings and the computation of impulse-based reward measures. We use our techniques to analyze a model whose solution using conventional methods would fail because of the state-space explosion. In the conclusion, we point out ideas to parallelize our approach.
Use Hierarchical Storage and Analysis to Exploit Intrinsic Parallelism
NASA Astrophysics Data System (ADS)
Zender, C. S.; Wang, W.; Vicente, P.
2013-12-01
Big Data is an ugly name for the scientific opportunities and challenges created by the growing wealth of geoscience data. How to weave large, disparate datasets together to best reveal their underlying properties, to exploit their strengths and minimize their weaknesses, to continually aggregate more information than the world knew yesterday and less than we will learn tomorrow? Data analytics techniques (statistics, data mining, machine learning, etc.) can accelerate pattern recognition and discovery. However, often researchers must, prior to analysis, organize multiple related datasets into a coherent framework. Hierarchical organization permits entire dataset to be stored in nested groups that reflect their intrinsic relationships and similarities. Hierarchical data can be simpler and faster to analyze by coding operators to automatically parallelize processes over isomorphic storage units, i.e., groups. The newest generation of netCDF Operators (NCO) embody this hierarchical approach, while still supporting traditional analysis approaches. We will use NCO to demonstrate the trade-offs involved in processing a prototypical Big Data application (analysis of CMIP5 datasets) using hierarchical and traditional analysis approaches.
Biocellion: accelerating computer simulation of multicellular biological system models
Kang, Seunghwa; Kahan, Simon; McDermott, Jason; Flann, Nicholas; Shmulevich, Ilya
2014-01-01
Motivation: Biological system behaviors are often the outcome of complex interactions among a large number of cells and their biotic and abiotic environment. Computational biologists attempt to understand, predict and manipulate biological system behavior through mathematical modeling and computer simulation. Discrete agent-based modeling (in combination with high-resolution grids to model the extracellular environment) is a popular approach for building biological system models. However, the computational complexity of this approach forces computational biologists to resort to coarser resolution approaches to simulate large biological systems. High-performance parallel computers have the potential to address the computing challenge, but writing efficient software for parallel computers is difficult and time-consuming. Results: We have developed Biocellion, a high-performance software framework, to solve this computing challenge using parallel computers. To support a wide range of multicellular biological system models, Biocellion asks users to provide their model specifics by filling the function body of pre-defined model routines. Using Biocellion, modelers without parallel computing expertise can efficiently exploit parallel computers with less effort than writing sequential programs from scratch. We simulate cell sorting, microbial patterning and a bacterial system in soil aggregate as case studies. Availability and implementation: Biocellion runs on x86 compatible systems with the 64 bit Linux operating system and is freely available for academic use. Visit http://biocellion.com for additional information. Contact: seunghwa.kang@pnnl.gov PMID:25064572
Multi-aircraft dynamics, navigation and operation
NASA Astrophysics Data System (ADS)
Houck, Sharon Wester
Air traffic control stands on the brink of a revolution. Fifty years from now, we will look back and marvel that we ever flew by radio beacons and radar alone, much as we now marvel that early aviation pioneers flew by chronometer and compass alone. The microprocessor, satellite navigation systems, and air-to-air data links are the technical keys to this revolution. Many airports are near or at capacity now for at least portions of the day, making it clear that major increases in airport capacity will be required in order to support the projected growth in air traffic. This can be accomplished by adding airports, adding runways at existing airports, or increasing the capacity of the existing runways. Technology that allows use of ultra closely spaced (750 ft to 2500 ft) parallel approaches would greatly reduce the environmental impact of airport capacity increases. This research tackles the problem of multi aircraft dynamics, navigation, and operation, specifically in the terminal area, and presents new findings on how ultra closely spaced parallel approaches may be accomplished. The underlying approach considers how multiple aircraft are flown in visual conditions, where spacing criteria is much less stringent, and then uses this data to study the critical parameters for collision avoidance during an ultra closely spaced parallel approach. Also included is experimental and analytical investigations on advanced guidance systems that are critical components of precision approaches. Together, these investigations form a novel approach to the design and analysis of parallel approaches for runways spaced less than 2500 ft apart. This research has concluded that it is technically feasible to reduce the required runway spacing during simultaneous instrument approaches to less than the current minimum of 3400 ft with the use of advanced navigation systems while maintaining the currently accepted levels of safety. On a smooth day with both pilots flying a tunnel-in-the-sky display and being guided by a Category I LAAS, it is technically feasible to reduce the runway spacing to 1100 ft. If a Category I LAAS and an "intelligent auto-pilot" that executes both the approach and emergency escape maneuver are used, the technically achievable required runway spacing is reduced to 750 ft. Both statements presume full aircraft state information, including position, velocity, and attitude, is being reliably passed between aircraft at a rate equal to or greater than one Hz.
NASA Astrophysics Data System (ADS)
Larour, Eric; Utke, Jean; Bovin, Anton; Morlighem, Mathieu; Perez, Gilberto
2016-11-01
Within the framework of sea-level rise projections, there is a strong need for hindcast validation of the evolution of polar ice sheets in a way that tightly matches observational records (from radar, gravity, and altimetry observations mainly). However, the computational requirements for making hindcast reconstructions possible are severe and rely mainly on the evaluation of the adjoint state of transient ice-flow models. Here, we look at the computation of adjoints in the context of the NASA/JPL/UCI Ice Sheet System Model (ISSM), written in C++ and designed for parallel execution with MPI. We present the adaptations required in the way the software is designed and written, but also generic adaptations in the tools facilitating the adjoint computations. We concentrate on the use of operator overloading coupled with the AdjoinableMPI library to achieve the adjoint computation of the ISSM. We present a comprehensive approach to (1) carry out type changing through the ISSM, hence facilitating operator overloading, (2) bind to external solvers such as MUMPS and GSL-LU, and (3) handle MPI-based parallelism to scale the capability. We demonstrate the success of the approach by computing sensitivities of hindcast metrics such as the misfit to observed records of surface altimetry on the northeastern Greenland Ice Stream, or the misfit to observed records of surface velocities on Upernavik Glacier, central West Greenland. We also provide metrics for the scalability of the approach, and the expected performance. This approach has the potential to enable a new generation of hindcast-validated projections that make full use of the wealth of datasets currently being collected, or already collected, in Greenland and Antarctica.
NASA Astrophysics Data System (ADS)
Perez, G. L.; Larour, E. Y.; Morlighem, M.
2016-12-01
Within the framework of sea-level rise projections, there is a strong need for hindcast validation of the evolution of polar ice sheets in a way that tightly matches observational records (from radar and altimetry observations mainly). However, the computational requirements for making hindcast reconstructions possible are severe and rely mainly on the evaluation of the adjoint state of transient ice-flow models. Here, we look at the computation of adjoints in the context of the NASA/JPL/UCI Ice Sheet System Model, written in C++ and designed for parallel execution with MPI. We present the adaptations required in the way the software is designed and written but also generic adaptations in the tools facilitating the adjoint computations. We concentrate on the use of operator overloading coupled with the AdjoinableMPI library to achieve the adjoint computation of ISSM. We present a comprehensive approach to 1) carry out type changing through ISSM, hence facilitating operator overloading, 2) bind to external solvers such as MUMPS and GSL-LU and 3) handle MPI-based parallelism to scale the capability. We demonstrate the success of the approach by computing sensitivities of hindcast metrics such as the misfit to observed records of surface altimetry on the North-East Greenland Ice Stream, or the misfit to observed records of surface velocities on Upernavik Glacier, Central West Greenland. We also provide metrics for the scalability of the approach, and the expected performance. This approach has the potential of enabling a new generation of hindcast-validated projections that make full use of the wealth of datasets currently being collected, or alreay collected in Greenland and Antarctica, such as surface altimetry, surface velocities, and/or gravity measurements.
A microfluidic approach to parallelized transcriptional profiling of single cells.
Sun, Hao; Olsen, Timothy; Zhu, Jing; Tao, Jianguo; Ponnaiya, Brian; Amundson, Sally A; Brenner, David J; Lin, Qiao
2015-12-01
The ability to correlate single-cell genetic information with cellular phenotypes is of great importance to biology and medicine, as it holds the potential to gain insight into disease pathways that is unavailable from ensemble measurements. We present a microfluidic approach to parallelized, rapid, quantitative analysis of messenger RNA from single cells via RT-qPCR. The approach leverages an array of single-cell RT-qPCR analysis units formed by a set of parallel microchannels concurrently controlled by elastomeric pneumatic valves, thereby enabling parallelized handling and processing of single cells in a drastically simplified operation procedure using a relatively small number of microvalves. All steps for single-cell RT-qPCR, including cell isolation and immobilization, cell lysis, mRNA purification, reverse transcription and qPCR, are integrated on a single chip, eliminating the need for off-chip manual cell and reagent transfer and qPCR amplification as commonly used in existing approaches. Additionally, the approach incorporates optically transparent microfluidic components to allow monitoring of single-cell trapping without the need for molecular labeling that can potentially alter the targeted gene expression and utilizes a polycarbonate film as a barrier against evaporation to minimize the loss of reagents at elevated temperatures during the analysis. We demonstrate the utility of the approach by the transcriptional profiling for the induction of the cyclin-dependent kinase inhibitor 1a and the glyceraldehyde 3-phosphate dehydrogenase in single cells from the MCF-7 breast cancer cell line. Furthermore, the methyl methanesulfonate is employed to allow measurement of the expression of the genes in individual cells responding to a genotoxic stress.
Distribution Locational Real-Time Pricing Based Smart Building Control and Management
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hao, Jun; Dai, Xiaoxiao; Zhang, Yingchen
This paper proposes an real-virtual parallel computing scheme for smart building operations aiming at augmenting overall social welfare. The University of Denver's campus power grid and Ritchie fitness center is used for demonstrating the proposed approach. An artificial virtual system is built in parallel to the real physical system to evaluate the overall social cost of the building operation based on the social science based working productivity model, numerical experiment based building energy consumption model and the power system based real-time pricing mechanism. Through interactive feedback exchanged between the real and virtual system, enlarged social welfare, including monetary cost reductionmore » and energy saving, as well as working productivity improvements, can be achieved.« less
Performance Evaluation of Evasion Maneuvers for Parallel Approach Collision Avoidance
NASA Technical Reports Server (NTRS)
Winder, Lee F.; Kuchar, James K.; Waller, Marvin (Technical Monitor)
2000-01-01
Current plans for independent instrument approaches to closely spaced parallel runways call for an automated pilot alerting system to ensure separation of aircraft in the case of a "blunder," or unexpected deviation from the a normal approach path. Resolution advisories by this system would require the pilot of an endangered aircraft to perform a trained evasion maneuver. The potential performance of two evasion maneuvers, referred to as the "turn-climb" and "climb-only," was estimated using an experimental NASA alerting logic (AILS) and a computer simulation of relative trajectory scenarios between two aircraft. One aircraft was equipped with the NASA alerting system, and maneuvered accordingly. Observation of the rates of different types of alerting failure allowed judgement of evasion maneuver performance. System Operating Characteristic (SOC) curves were used to assess the benefit of alerting with each maneuver.
Price, Anthony N.; Padormo, Francesco; Hajnal, Joseph V.; Malik, Shaihan J.
2017-01-01
Cardiac magnetic resonance imaging (MRI) at high field presents challenges because of the high specific absorption rate and significant transmit field (B 1 +) inhomogeneities. Parallel transmission MRI offers the ability to correct for both issues at the level of individual radiofrequency (RF) pulses, but must operate within strict hardware and safety constraints. The constraints are themselves affected by sequence parameters, such as the RF pulse duration and TR, meaning that an overall optimal operating point exists for a given sequence. This work seeks to obtain optimal performance by performing a ‘sequence‐level’ optimization in which pulse sequence parameters are included as part of an RF shimming calculation. The method is applied to balanced steady‐state free precession cardiac MRI with the objective of minimizing TR, hence reducing the imaging duration. Results are demonstrated using an eight‐channel parallel transmit system operating at 3 T, with an in vivo study carried out on seven male subjects of varying body mass index (BMI). Compared with single‐channel operation, a mean‐squared‐error shimming approach leads to reduced imaging durations of 32 ± 3% with simultaneous improvement in flip angle homogeneity of 32 ± 8% within the myocardium. PMID:28195684
Beqiri, Arian; Price, Anthony N; Padormo, Francesco; Hajnal, Joseph V; Malik, Shaihan J
2017-06-01
Cardiac magnetic resonance imaging (MRI) at high field presents challenges because of the high specific absorption rate and significant transmit field (B 1 + ) inhomogeneities. Parallel transmission MRI offers the ability to correct for both issues at the level of individual radiofrequency (RF) pulses, but must operate within strict hardware and safety constraints. The constraints are themselves affected by sequence parameters, such as the RF pulse duration and TR, meaning that an overall optimal operating point exists for a given sequence. This work seeks to obtain optimal performance by performing a 'sequence-level' optimization in which pulse sequence parameters are included as part of an RF shimming calculation. The method is applied to balanced steady-state free precession cardiac MRI with the objective of minimizing TR, hence reducing the imaging duration. Results are demonstrated using an eight-channel parallel transmit system operating at 3 T, with an in vivo study carried out on seven male subjects of varying body mass index (BMI). Compared with single-channel operation, a mean-squared-error shimming approach leads to reduced imaging durations of 32 ± 3% with simultaneous improvement in flip angle homogeneity of 32 ± 8% within the myocardium. © 2017 The Authors. NMR in Biomedicine published by John Wiley & Sons Ltd.
Broadcasting collective operation contributions throughout a parallel computer
Faraj, Ahmad [Rochester, MN
2012-02-21
Methods, systems, and products are disclosed for broadcasting collective operation contributions throughout a parallel computer. The parallel computer includes a plurality of compute nodes connected together through a data communications network. Each compute node has a plurality of processors for use in collective parallel operations on the parallel computer. Broadcasting collective operation contributions throughout a parallel computer according to embodiments of the present invention includes: transmitting, by each processor on each compute node, that processor's collective operation contribution to the other processors on that compute node using intra-node communications; and transmitting on a designated network link, by each processor on each compute node according to a serial processor transmission sequence, that processor's collective operation contribution to the other processors on the other compute nodes using inter-node communications.
Biocellion: accelerating computer simulation of multicellular biological system models.
Kang, Seunghwa; Kahan, Simon; McDermott, Jason; Flann, Nicholas; Shmulevich, Ilya
2014-11-01
Biological system behaviors are often the outcome of complex interactions among a large number of cells and their biotic and abiotic environment. Computational biologists attempt to understand, predict and manipulate biological system behavior through mathematical modeling and computer simulation. Discrete agent-based modeling (in combination with high-resolution grids to model the extracellular environment) is a popular approach for building biological system models. However, the computational complexity of this approach forces computational biologists to resort to coarser resolution approaches to simulate large biological systems. High-performance parallel computers have the potential to address the computing challenge, but writing efficient software for parallel computers is difficult and time-consuming. We have developed Biocellion, a high-performance software framework, to solve this computing challenge using parallel computers. To support a wide range of multicellular biological system models, Biocellion asks users to provide their model specifics by filling the function body of pre-defined model routines. Using Biocellion, modelers without parallel computing expertise can efficiently exploit parallel computers with less effort than writing sequential programs from scratch. We simulate cell sorting, microbial patterning and a bacterial system in soil aggregate as case studies. Biocellion runs on x86 compatible systems with the 64 bit Linux operating system and is freely available for academic use. Visit http://biocellion.com for additional information. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
A study of DC-DC converters with MCT's for arcjet power supplies
NASA Technical Reports Server (NTRS)
Stuart, Thomas A.
1994-01-01
Many arcjet DC power supplies use PWM full bridge converters with large arrays of parallel FET's. This report investigates an alternative supply using a variable frequency series resonant converter with small arrays of parallel MCT's (metal oxide semiconductor controlled thyristors). The reasons for this approach are to: increase reliability by reducing the number of switching devices; and decrease the surface mounting area of the switching arrays. The variable frequency series resonant approach is used because the relatively slow switching speed of the MCT precludes the use of PWM. The 10 kW converter operated satisfactorily with an efficiency of over 91 percent. Test results indicate this efficiency could be increased further by additional optimization of the series resonant inductor.
Simulated Wake Characteristics Data for Closely Spaced Parallel Runway Operations Analysis
NASA Technical Reports Server (NTRS)
Guerreiro, Nelson M.; Neitzke, Kurt W.
2012-01-01
A simulation experiment was performed to generate and compile wake characteristics data relevant to the evaluation and feasibility analysis of closely spaced parallel runway (CSPR) operational concepts. While the experiment in this work is not tailored to any particular operational concept, the generated data applies to the broader class of CSPR concepts, where a trailing aircraft on a CSPR approach is required to stay ahead of the wake vortices generated by a lead aircraft on an adjacent CSPR. Data for wake age, circulation strength, and wake altitude change, at various lateral offset distances from the wake-generating lead aircraft approach path were compiled for a set of nine aircraft spanning the full range of FAA and ICAO wake classifications. A total of 54 scenarios were simulated to generate data related to key parameters that determine wake behavior. Of particular interest are wake age characteristics that can be used to evaluate both time- and distance- based in-trail separation concepts for all aircraft wake-class combinations. A simple first-order difference model was developed to enable the computation of wake parameter estimates for aircraft models having weight, wingspan and speed characteristics similar to those of the nine aircraft modeled in this work.
Efficient Scalable Median Filtering Using Histogram-Based Operations.
Green, Oded
2018-05-01
Median filtering is a smoothing technique for noise removal in images. While there are various implementations of median filtering for a single-core CPU, there are few implementations for accelerators and multi-core systems. Many parallel implementations of median filtering use a sorting algorithm for rearranging the values within a filtering window and taking the median of the sorted value. While using sorting algorithms allows for simple parallel implementations, the cost of the sorting becomes prohibitive as the filtering windows grow. This makes such algorithms, sequential and parallel alike, inefficient. In this work, we introduce the first software parallel median filtering that is non-sorting-based. The new algorithm uses efficient histogram-based operations. These reduce the computational requirements of the new algorithm while also accessing the image fewer times. We show an implementation of our algorithm for both the CPU and NVIDIA's CUDA supported graphics processing unit (GPU). The new algorithm is compared with several other leading CPU and GPU implementations. The CPU implementation has near perfect linear scaling with a speedup on a quad-core system. The GPU implementation is several orders of magnitude faster than the other GPU implementations for mid-size median filters. For small kernels, and , comparison-based approaches are preferable as fewer operations are required. Lastly, the new algorithm is open-source and can be found in the OpenCV library.
NASA Technical Reports Server (NTRS)
Jost, Gabriele; Labarta, Jesus; Gimenez, Judit
2004-01-01
With the current trend in parallel computer architectures towards clusters of shared memory symmetric multi-processors, parallel programming techniques have evolved that support parallelism beyond a single level. When comparing the performance of applications based on different programming paradigms, it is important to differentiate between the influence of the programming model itself and other factors, such as implementation specific behavior of the operating system (OS) or architectural issues. Rewriting-a large scientific application in order to employ a new programming paradigms is usually a time consuming and error prone task. Before embarking on such an endeavor it is important to determine that there is really a gain that would not be possible with the current implementation. A detailed performance analysis is crucial to clarify these issues. The multilevel programming paradigms considered in this study are hybrid MPI/OpenMP, MLP, and nested OpenMP. The hybrid MPI/OpenMP approach is based on using MPI [7] for the coarse grained parallelization and OpenMP [9] for fine grained loop level parallelism. The MPI programming paradigm assumes a private address space for each process. Data is transferred by explicitly exchanging messages via calls to the MPI library. This model was originally designed for distributed memory architectures but is also suitable for shared memory systems. The second paradigm under consideration is MLP which was developed by Taft. The approach is similar to MPi/OpenMP, using a mix of coarse grain process level parallelization and loop level OpenMP parallelization. As it is the case with MPI, a private address space is assumed for each process. The MLP approach was developed for ccNUMA architectures and explicitly takes advantage of the availability of shared memory. A shared memory arena which is accessible by all processes is required. Communication is done by reading from and writing to the shared memory.
Dynamic Load-Balancing for Distributed Heterogeneous Computing of Parallel CFD Problems
NASA Technical Reports Server (NTRS)
Ecer, A.; Chien, Y. P.; Boenisch, T.; Akay, H. U.
2000-01-01
The developed methodology is aimed at improving the efficiency of executing block-structured algorithms on parallel, distributed, heterogeneous computers. The basic approach of these algorithms is to divide the flow domain into many sub- domains called blocks, and solve the governing equations over these blocks. Dynamic load balancing problem is defined as the efficient distribution of the blocks among the available processors over a period of several hours of computations. In environments with computers of different architecture, operating systems, CPU speed, memory size, load, and network speed, balancing the loads and managing the communication between processors becomes crucial. Load balancing software tools for mutually dependent parallel processes have been created to efficiently utilize an advanced computation environment and algorithms. These tools are dynamic in nature because of the chances in the computer environment during execution time. More recently, these tools were extended to a second operating system: NT. In this paper, the problems associated with this application will be discussed. Also, the developed algorithms were combined with the load sharing capability of LSF to efficiently utilize workstation clusters for parallel computing. Finally, results will be presented on running a NASA based code ADPAC to demonstrate the developed tools for dynamic load balancing.
Maji, Kaushik; Kouri, Donald J
2011-03-28
We have developed a new method for solving quantum dynamical scattering problems, using the time-independent Schrödinger equation (TISE), based on a novel method to generalize a "one-way" quantum mechanical wave equation, impose correct boundary conditions, and eliminate exponentially growing closed channel solutions. The approach is readily parallelized to achieve approximate N(2) scaling, where N is the number of coupled equations. The full two-way nature of the TISE is included while propagating the wave function in the scattering variable and the full S-matrix is obtained. The new algorithm is based on a "Modified Cayley" operator splitting approach, generalizing earlier work where the method was applied to the time-dependent Schrödinger equation. All scattering variable propagation approaches to solving the TISE involve solving a Helmholtz-type equation, and for more than one degree of freedom, these are notoriously ill-behaved, due to the unavoidable presence of exponentially growing contributions to the numerical solution. Traditionally, the method used to eliminate exponential growth has posed a major obstacle to the full parallelization of such propagation algorithms. We stabilize by using the Feshbach projection operator technique to remove all the nonphysical exponentially growing closed channels, while retaining all of the propagating open channel components, as well as exponentially decaying closed channel components.
NASA Technical Reports Server (NTRS)
Cassell, Rick; Smith, Alex; Connors, Mary; Wojciech, Jack; Rosekind, Mark R. (Technical Monitor)
1996-01-01
As new technologies and procedures are introduced into the National Airspace System, whether they are intended to improve efficiency, capacity, or safety level, the quantification of potential changes in safety levels is of vital concern. Applications of technology can improve safety levels and allow the reduction of separation standards. An excellent example is the Precision Runway Monitor (PRM). By taking advantage of the surveillance and display advances of PRM, airports can run instrument parallel approaches to runways separated by 3400 feet with the same level of safety as parallel approaches to runways separated by 4300 feet using the standard technology. Despite a wealth of information from flight operations and testing programs, there is no readily quantifiable relationship between numerical safety levels and the separation standards that apply to aircraft on final approach. This paper presents a modeling approach to quantify the risk associated with reducing separation on final approach. Reducing aircraft separation, both laterally and longitudinally, has been the goal of several aviation R&D programs over the past several years. Many of these programs have focused on technological solutions to improve navigation accuracy, surveillance accuracy, aircraft situational awareness, controller situational awareness, and other technical and operational factors that are vital to maintaining flight safety. The risk assessment model relates different types of potential aircraft accidents and incidents and their contribution to overall accident risk. The framework links accident risks to a hierarchy of failsafe mechanisms characterized by procedures and interventions. The model will be used to assess the overall level of safety associated with reducing separation standards and the introduction of new technology and procedures, as envisaged under the Free Flight concept. The model framework can be applied to various aircraft scenarios, including parallel and in-trail approaches. This research was performed under contract to NASA and in cooperation with the FAA's Safety Division (ASY).
Maximal clique enumeration with data-parallel primitives
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lessley, Brenton; Perciano, Talita; Mathai, Manish
The enumeration of all maximal cliques in an undirected graph is a fundamental problem arising in several research areas. We consider maximal clique enumeration on shared-memory, multi-core architectures and introduce an approach consisting entirely of data-parallel operations, in an effort to achieve efficient and portable performance across different architectures. We study the performance of the algorithm via experiments varying over benchmark graphs and architectures. Overall, we observe that our algorithm achieves up to a 33-time speedup and 9-time speedup over state-of-the-art distributed and serial algorithms, respectively, for graphs with higher ratios of maximal cliques to total cliques. Further, we attainmore » additional speedups on a GPU architecture, demonstrating the portable performance of our data-parallel design.« less
A parallel data management system for large-scale NASA datasets
NASA Technical Reports Server (NTRS)
Srivastava, Jaideep
1993-01-01
The past decade has experienced a phenomenal growth in the amount of data and resultant information generated by NASA's operations and research projects. A key application is the reprocessing problem which has been identified to require data management capabilities beyond those available today (PRAT93). The Intelligent Information Fusion (IIF) system (ROEL91) is an ongoing NASA project which has similar requirements. Deriving our understanding of NASA's future data management needs based on the above, this paper describes an approach to using parallel computer systems (processor and I/O architectures) to develop an efficient parallel database management system to address the needs. Specifically, we propose to investigate issues in low-level record organizations and management, complex query processing, and query compilation and scheduling.
Bellucci, Michael A; Coker, David F
2011-07-28
We describe a new method for constructing empirical valence bond potential energy surfaces using a parallel multilevel genetic program (PMLGP). Genetic programs can be used to perform an efficient search through function space and parameter space to find the best functions and sets of parameters that fit energies obtained by ab initio electronic structure calculations. Building on the traditional genetic program approach, the PMLGP utilizes a hierarchy of genetic programming on two different levels. The lower level genetic programs are used to optimize coevolving populations in parallel while the higher level genetic program (HLGP) is used to optimize the genetic operator probabilities of the lower level genetic programs. The HLGP allows the algorithm to dynamically learn the mutation or combination of mutations that most effectively increase the fitness of the populations, causing a significant increase in the algorithm's accuracy and efficiency. The algorithm's accuracy and efficiency is tested against a standard parallel genetic program with a variety of one-dimensional test cases. Subsequently, the PMLGP is utilized to obtain an accurate empirical valence bond model for proton transfer in 3-hydroxy-gamma-pyrone in gas phase and protic solvent. © 2011 American Institute of Physics
Three-Dimensional High-Lift Analysis Using a Parallel Unstructured Multigrid Solver
NASA Technical Reports Server (NTRS)
Mavriplis, Dimitri J.
1998-01-01
A directional implicit unstructured agglomeration multigrid solver is ported to shared and distributed memory massively parallel machines using the explicit domain-decomposition and message-passing approach. Because the algorithm operates on local implicit lines in the unstructured mesh, special care is required in partitioning the problem for parallel computing. A weighted partitioning strategy is described which avoids breaking the implicit lines across processor boundaries, while incurring minimal additional communication overhead. Good scalability is demonstrated on a 128 processor SGI Origin 2000 machine and on a 512 processor CRAY T3E machine for reasonably fine grids. The feasibility of performing large-scale unstructured grid calculations with the parallel multigrid algorithm is demonstrated by computing the flow over a partial-span flap wing high-lift geometry on a highly resolved grid of 13.5 million points in approximately 4 hours of wall clock time on the CRAY T3E.
Coordinated Parallel Runway Approaches
NASA Technical Reports Server (NTRS)
Koczo, Steve
1996-01-01
The current air traffic environment in airport terminal areas experiences substantial delays when weather conditions deteriorate to Instrument Meteorological Conditions (IMC). Expected future increases in air traffic will put additional pressures on the National Airspace System (NAS) and will further compound the high costs associated with airport delays. To address this problem, NASA has embarked on a program to address Terminal Area Productivity (TAP). The goals of the TAP program are to provide increased efficiencies in air traffic during the approach, landing, and surface operations in low-visibility conditions. The ultimate goal is to achieve efficiencies of terminal area flight operations commensurate with Visual Meteorological Conditions (VMC) at current or improved levels of safety.
Simulator Evaluation of Airborne Information for Lateral Spacing (AILS) Concept
NASA Technical Reports Server (NTRS)
Abbott, Terence S.; Elliott, Dawn M.
2001-01-01
The Airborne Information for Lateral Spacing (AILS) concept is designed to support independent parallel approach operations to runways spaced as close as 2500 ft. This report describes the AILS operational concept and the results of a ground-based flight simulation experiment of one implementation of this concept. The focus of this simulation experiment was to evaluate pilot performance, pilot acceptability, and minimum miss-distances for the rare situation in which all aircraft oil one approach intrudes into the path of an aircraft oil the other approach. Results from this study showed that the design-goal mean miss-distance of 1200 ft to potential collision situations was surpassed with an actual mean miss-distance of 2236 ft. Pilot reaction times to the alerting system, which was an operational concern, averaged 1.11 sec, well below the design-goal reaction time 2.0 sec.These quantitative results and pilot subjective data showed that the AILS concept is reasonable from an operational standpoint.
NASA Astrophysics Data System (ADS)
Chacon, Luis; Del-Castillo-Negrete, Diego; Hauck, Cory
2012-10-01
Modeling electron transport in magnetized plasmas is extremely challenging due to the extreme anisotropy between parallel (to the magnetic field) and perpendicular directions (χ/χ˜10^10 in fusion plasmas). Recently, a Lagrangian Green's function approach, developed for the purely parallel transport case,footnotetextD. del-Castillo-Negrete, L. Chac'on, PRL, 106, 195004 (2011)^,footnotetextD. del-Castillo-Negrete, L. Chac'on, Phys. Plasmas, 19, 056112 (2012) has been extended to the anisotropic transport case in the tokamak-ordering limit with constant density.footnotetextL. Chac'on, D. del-Castillo-Negrete, C. Hauck, JCP, submitted (2012) An operator-split algorithm is proposed that allows one to treat Eulerian and Lagrangian components separately. The approach is shown to feature bounded numerical errors for arbitrary χ/χ ratios, which renders it asymptotic-preserving. In this poster, we will present the generalization of the Lagrangian approach to arbitrary magnetic fields. We will demonstrate the potential of the approach with various challenging configurations, including the case of transport across a magnetic island in cylindrical geometry.
Analytical Assessment of Simultaneous Parallel Approach Feasibility from Total System Error
NASA Technical Reports Server (NTRS)
Madden, Michael M.
2014-01-01
In a simultaneous paired approach to closely-spaced parallel runways, a pair of aircraft flies in close proximity on parallel approach paths. The aircraft pair must maintain a longitudinal separation within a range that avoids wake encounters and, if one of the aircraft blunders, avoids collision. Wake avoidance defines the rear gate of the longitudinal separation. The lead aircraft generates a wake vortex that, with the aid of crosswinds, can travel laterally onto the path of the trail aircraft. As runway separation decreases, the wake has less distance to traverse to reach the path of the trail aircraft. The total system error of each aircraft further reduces this distance. The total system error is often modeled as a probability distribution function. Therefore, Monte-Carlo simulations are a favored tool for assessing a "safe" rear-gate. However, safety for paired approaches typically requires that a catastrophic wake encounter be a rare one-in-a-billion event during normal operation. Using a Monte-Carlo simulation to assert this event rarity with confidence requires a massive number of runs. Such large runs do not lend themselves to rapid turn-around during the early stages of investigation when the goal is to eliminate the infeasible regions of the solution space and to perform trades among the independent variables in the operational concept. One can employ statistical analysis using simplified models more efficiently to narrow the solution space and identify promising trades for more in-depth investigation using Monte-Carlo simulations. These simple, analytical models not only have to address the uncertainty of the total system error but also the uncertainty in navigation sources used to alert an abort of the procedure. This paper presents a method for integrating total system error, procedure abort rates, avionics failures, and surveillance errors into a statistical analysis that identifies the likely feasible runway separations for simultaneous paired approaches.
Wang, Zhaocai; Ji, Zuwen; Wang, Xiaoming; Wu, Tunhua; Huang, Wei
2017-12-01
As a promising approach to solve the computationally intractable problem, the method based on DNA computing is an emerging research area including mathematics, computer science and molecular biology. The task scheduling problem, as a well-known NP-complete problem, arranges n jobs to m individuals and finds the minimum execution time of last finished individual. In this paper, we use a biologically inspired computational model and describe a new parallel algorithm to solve the task scheduling problem by basic DNA molecular operations. In turn, we skillfully design flexible length DNA strands to represent elements of the allocation matrix, take appropriate biological experiment operations and get solutions of the task scheduling problem in proper length range with less than O(n 2 ) time complexity. Copyright © 2017. Published by Elsevier B.V.
Ordering Traces Logically to Identify Lateness in Message Passing Programs
Isaacs, Katherine E.; Gamblin, Todd; Bhatele, Abhinav; ...
2015-03-30
Event traces are valuable for understanding the behavior of parallel programs. However, automatically analyzing a large parallel trace is difficult, especially without a specific objective. We aid this endeavor by extracting a trace's logical structure, an ordering of trace events derived from happened-before relationships, while taking into account developer intent. Using this structure, we can calculate an operation's delay relative to its peers on other processes. The logical structure also serves as a platform for comparing and clustering processes as well as highlighting communication patterns in a trace visualization. We present an algorithm for determining this idealized logical structure frommore » traces of message passing programs, and we develop metrics to quantify delays and differences among processes. We implement our techniques in Ravel, a parallel trace visualization tool that displays both logical and physical timelines. Rather than showing the duration of each operation, we display where delays begin and end, and how they propagate. As a result, we apply our approach to the traces of several message passing applications, demonstrating the accuracy of our extracted structure and its utility in analyzing these codes.« less
Enabling the High Level Synthesis of Data Analytics Accelerators
DOE Office of Scientific and Technical Information (OSTI.GOV)
Minutoli, Marco; Castellana, Vito G.; Tumeo, Antonino
Conventional High Level Synthesis (HLS) tools mainly tar- get compute intensive kernels typical of digital signal pro- cessing applications. We are developing techniques and ar- chitectural templates to enable HLS of data analytics appli- cations. These applications are memory intensive, present fine-grained, unpredictable data accesses, and irregular, dy- namic task parallelism. We discuss an architectural tem- plate based around a distributed controller to efficiently ex- ploit thread level parallelism. We present a memory in- terface that supports parallel memory subsystems and en- ables implementing atomic memory operations. We intro- duce a dynamic task scheduling approach to efficiently ex- ecute heavilymore » unbalanced workload. The templates are val- idated by synthesizing queries from the Lehigh University Benchmark (LUBM), a well know SPARQL benchmark.« less
A message passing kernel for the hypercluster parallel processing test bed
NASA Technical Reports Server (NTRS)
Blech, Richard A.; Quealy, Angela; Cole, Gary L.
1989-01-01
A Message-Passing Kernel (MPK) for the Hypercluster parallel-processing test bed is described. The Hypercluster is being developed at the NASA Lewis Research Center to support investigations of parallel algorithms and architectures for computational fluid and structural mechanics applications. The Hypercluster resembles the hypercube architecture except that each node consists of multiple processors communicating through shared memory. The MPK efficiently routes information through the Hypercluster, using a message-passing protocol when necessary and faster shared-memory communication whenever possible. The MPK also interfaces all of the processors with the Hypercluster operating system (HYCLOPS), which runs on a Front-End Processor (FEP). This approach distributes many of the I/O tasks to the Hypercluster processors and eliminates the need for a separate I/O support program on the FEP.
NASA Astrophysics Data System (ADS)
Ganesan, Nandhini; Basu, Suman; Hariharan, Krishnan S.; Kolake, Subramanya Mayya; Song, Taewon; Yeo, Taejung; Sohn, Dong Kee; Doo, Seokgwang
2016-08-01
Lithium-Ion batteries used for electric vehicle applications are subject to large currents and various operation conditions, making battery pack design and life extension a challenging problem. With increase in complexity, modeling and simulation can lead to insights that ensure optimal performance and life extension. In this manuscript, an electrochemical-thermal (ECT) coupled model for a 6 series × 5 parallel pack is developed for Li ion cells with NCA/C electrodes and validated against experimental data. Contribution of the cathode to overall degradation at various operating conditions is assessed. Pack asymmetry is analyzed from a design and an operational perspective. Design based asymmetry leads to a new approach of obtaining the individual cell responses of the pack from an average ECT output. Operational asymmetry is demonstrated in terms of effects of thermal gradients on cycle life, and an efficient model predictive control technique is developed. Concept of reconfigurable battery pack is studied using detailed simulations that can be used for effective monitoring and extension of battery pack life.
Archer, Charles J.; Blocksome, Michael A.; Ratterman, Joseph D.; Smith, Brian E.
2014-08-12
Endpoint-based parallel data processing in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI composed of data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes coupled for data communications through the PAMI, including establishing a data communications geometry, the geometry specifying, for tasks representing processes of execution of the parallel application, a set of endpoints that are used in collective operations of the PAMI including a plurality of endpoints for one of the tasks; receiving in endpoints of the geometry an instruction for a collective operation; and executing the instruction for a collective operation through the endpoints in dependence upon the geometry, including dividing data communications operations among the plurality of endpoints for one of the tasks.
Efficient solution of parabolic equations by Krylov approximation methods
NASA Technical Reports Server (NTRS)
Gallopoulos, E.; Saad, Y.
1990-01-01
Numerical techniques for solving parabolic equations by the method of lines is addressed. The main motivation for the proposed approach is the possibility of exploiting a high degree of parallelism in a simple manner. The basic idea of the method is to approximate the action of the evolution operator on a given state vector by means of a projection process onto a Krylov subspace. Thus, the resulting approximation consists of applying an evolution operator of a very small dimension to a known vector which is, in turn, computed accurately by exploiting well-known rational approximations to the exponential. Because the rational approximation is only applied to a small matrix, the only operations required with the original large matrix are matrix-by-vector multiplications, and as a result the algorithm can easily be parallelized and vectorized. Some relevant approximation and stability issues are discussed. We present some numerical experiments with the method and compare its performance with a few explicit and implicit algorithms.
Malleable architecture generator for FPGA computing
NASA Astrophysics Data System (ADS)
Gokhale, Maya; Kaba, James; Marks, Aaron; Kim, Jang
1996-10-01
The malleable architecture generator (MARGE) is a tool set that translates high-level parallel C to configuration bit streams for field-programmable logic based computing systems. MARGE creates an application-specific instruction set and generates the custom hardware components required to perform exactly those computations specified by the C program. In contrast to traditional fixed-instruction processors, MARGE's dynamic instruction set creation provides for efficient use of hardware resources. MARGE processes intermediate code in which each operation is annotated by the bit lengths of the operands. Each basic block (sequence of straight line code) is mapped into a single custom instruction which contains all the operations and logic inherent in the block. A synthesis phase maps the operations comprising the instructions into register transfer level structural components and control logic which have been optimized to exploit functional parallelism and function unit reuse. As a final stage, commercial technology-specific tools are used to generate configuration bit streams for the desired target hardware. Technology- specific pre-placed, pre-routed macro blocks are utilized to implement as much of the hardware as possible. MARGE currently supports the Xilinx-based Splash-2 reconfigurable accelerator and National Semiconductor's CLAy-based parallel accelerator, MAPA. The MARGE approach has been demonstrated on systolic applications such as DNA sequence comparison.
Arctic and Antarctic Analogs for Planetary Surface Traverses
NASA Technical Reports Server (NTRS)
Hoffman, Stephen J.; Cameron, A. O.
2009-01-01
The proposed paper summarizes the workshop presentations and discusses several of the key findings or lessons including: (1) A recognition that NASA s current approach for long duration planetary surface operations has fundamental differences from any of the operational approaches described by the invited speakers. These approaches drive the crew size and skill mix to accomplish basic objectives and, in turn, drive the logistical pyramid needed to support these operations. NASA will review the operational approaches of the organizations represented to understand the differentiating factors. NASA will then decide if it should alter its current approach to surface exploration. (2) There are potential parallels between key characteristics of the systems used for exploration in these environments, such as heated volume as an analog for pressurized volume or energy usage for various activities. NASA will look at these characteristics to identify which could help with preliminary planning and gather raw data from the presenters to model these characteristics. (3) New technologies are being applied and design approaches are being tailored to take advantage of these technologies on both side. Interactions between these two communities has begun or is expanding to understand how these new technologies are being leveraged: NASA habitation designers are exchanging ideas and approaches with the Antarctic station designers; Antarctic support
Bit-parallel arithmetic in a massively-parallel associative processor
NASA Technical Reports Server (NTRS)
Scherson, Isaac D.; Kramer, David A.; Alleyne, Brian D.
1992-01-01
A simple but powerful new architecture based on a classical associative processor model is presented. Algorithms for performing the four basic arithmetic operations both for integer and floating point operands are described. For m-bit operands, the proposed architecture makes it possible to execute complex operations in O(m) cycles as opposed to O(m exp 2) for bit-serial machines. A word-parallel, bit-parallel, massively-parallel computing system can be constructed using this architecture with VLSI technology. The operation of this system is demonstrated for the fast Fourier transform and matrix multiplication.
Preliminary Evaluation of MapReduce for High-Performance Climate Data Analysis
NASA Technical Reports Server (NTRS)
Duffy, Daniel Q.; Schnase, John L.; Thompson, John H.; Freeman, Shawn M.; Clune, Thomas L.
2012-01-01
MapReduce is an approach to high-performance analytics that may be useful to data intensive problems in climate research. It offers an analysis paradigm that uses clusters of computers and combines distributed storage of large data sets with parallel computation. We are particularly interested in the potential of MapReduce to speed up basic operations common to a wide range of analyses. In order to evaluate this potential, we are prototyping a series of canonical MapReduce operations over a test suite of observational and climate simulation datasets. Our initial focus has been on averaging operations over arbitrary spatial and temporal extents within Modern Era Retrospective- Analysis for Research and Applications (MERRA) data. Preliminary results suggest this approach can improve efficiencies within data intensive analytic workflows.
Hine, N D M; Haynes, P D; Mostofi, A A; Payne, M C
2010-09-21
We present calculations of formation energies of defects in an ionic solid (Al(2)O(3)) extrapolated to the dilute limit, corresponding to a simulation cell of infinite size. The large-scale calculations required for this extrapolation are enabled by developments in the approach to parallel sparse matrix algebra operations, which are central to linear-scaling density-functional theory calculations. The computational cost of manipulating sparse matrices, whose sizes are determined by the large number of basis functions present, is greatly improved with this new approach. We present details of the sparse algebra scheme implemented in the ONETEP code using hierarchical sparsity patterns, and demonstrate its use in calculations on a wide range of systems, involving thousands of atoms on hundreds to thousands of parallel processes.
Breaking the Cycle: A Phenomenological Approach to Broadening Access to Post-Secondary Education
ERIC Educational Resources Information Center
Cefai, Carmel; Downes, Paul; Cavioni, Valeria
2016-01-01
Over the past decades, there has been a substantial increase in post-secondary education participation in most Organisation for Economic Co-operation and Development (OECD) and European Union countries. This increase, however, does not necessarily reflect a parallel equitable growth in post-secondary education, and early school leaving is still an…
Super-resolved Parallel MRI by Spatiotemporal Encoding
Schmidt, Rita; Baishya, Bikash; Ben-Eliezer, Noam; Seginer, Amir; Frydman, Lucio
2016-01-01
Recent studies described an alternative “ultrafast” scanning method based on spatiotemporal (SPEN) principles. SPEN demonstrates numerous potential advantages over EPI-based alternatives, at no additional expense in experimental complexity. An important aspect that SPEN still needs to achieve for providing a competitive acquisition alternative entails exploiting parallel imaging algorithms, without compromising its proven capabilities. The present work introduces a combination of multi-band frequency-swept pulses simultaneously encoding multiple, partial fields-of-view; together with a new algorithm merging a Super-Resolved SPEN image reconstruction and SENSE multiple-receiving methods. The ensuing approach enables one to reduce both the excitation and acquisition times of ultrafast SPEN acquisitions by the customary acceleration factor R, without compromises in either the ensuing spatial resolution, SAR deposition, or the capability to operate in multi-slice mode. The performance of these new single-shot imaging sequences and their ancillary algorithms were explored on phantoms and human volunteers at 3T. The gains of the parallelized approach were particularly evident when dealing with heterogeneous systems subject to major T2/T2* effects, as is the case upon single-scan imaging near tissue/air interfaces. PMID:24120293
Broadcasting a message in a parallel computer
Berg, Jeremy E [Rochester, MN; Faraj, Ahmad A [Rochester, MN
2011-08-02
Methods, systems, and products are disclosed for broadcasting a message in a parallel computer. The parallel computer includes a plurality of compute nodes connected together using a data communications network. The data communications network optimized for point to point data communications and is characterized by at least two dimensions. The compute nodes are organized into at least one operational group of compute nodes for collective parallel operations of the parallel computer. One compute node of the operational group assigned to be a logical root. Broadcasting a message in a parallel computer includes: establishing a Hamiltonian path along all of the compute nodes in at least one plane of the data communications network and in the operational group; and broadcasting, by the logical root to the remaining compute nodes, the logical root's message along the established Hamiltonian path.
Chrestenson transform FPGA embedded factorizations.
Corinthios, Michael J
2016-01-01
Chrestenson generalized Walsh transform factorizations for parallel processing imbedded implementations on field programmable gate arrays are presented. This general base transform, sometimes referred to as the Discrete Chrestenson transform, has received special attention in recent years. In fact, the Discrete Fourier transform and Walsh-Hadamard transform are but special cases of the Chrestenson generalized Walsh transform. Rotations of a base-p hypercube, where p is an arbitrary integer, are shown to produce dynamic contention-free memory allocation, in processor architecture. The approach is illustrated by factorizations involving the processing of matrices of the transform which are function of four variables. Parallel operations are implemented matrix multiplications. Each matrix, of dimension N × N, where N = p (n) , n integer, has a structure that depends on a variable parameter k that denotes the iteration number in the factorization process. The level of parallelism, in the form of M = p (m) processors can be chosen arbitrarily by varying m between zero to its maximum value of n - 1. The result is an equation describing the generalised parallelism factorization as a function of the four variables n, p, k and m. Applications of the approach are shown in relation to configuring field programmable gate arrays for digital signal processing applications.
MLP: A Parallel Programming Alternative to MPI for New Shared Memory Parallel Systems
NASA Technical Reports Server (NTRS)
Taft, James R.
1999-01-01
Recent developments at the NASA AMES Research Center's NAS Division have demonstrated that the new generation of NUMA based Symmetric Multi-Processing systems (SMPs), such as the Silicon Graphics Origin 2000, can successfully execute legacy vector oriented CFD production codes at sustained rates far exceeding processing rates possible on dedicated 16 CPU Cray C90 systems. This high level of performance is achieved via shared memory based Multi-Level Parallelism (MLP). This programming approach, developed at NAS and outlined below, is distinct from the message passing paradigm of MPI. It offers parallelism at both the fine and coarse grained level, with communication latencies that are approximately 50-100 times lower than typical MPI implementations on the same platform. Such latency reductions offer the promise of performance scaling to very large CPU counts. The method draws on, but is also distinct from, the newly defined OpenMP specification, which uses compiler directives to support a limited subset of multi-level parallel operations. The NAS MLP method is general, and applicable to a large class of NASA CFD codes.
Palkowski, Marek; Bielecki, Wlodzimierz
2017-06-02
RNA secondary structure prediction is a compute intensive task that lies at the core of several search algorithms in bioinformatics. Fortunately, the RNA folding approaches, such as the Nussinov base pair maximization, involve mathematical operations over affine control loops whose iteration space can be represented by the polyhedral model. Polyhedral compilation techniques have proven to be a powerful tool for optimization of dense array codes. However, classical affine loop nest transformations used with these techniques do not optimize effectively codes of dynamic programming of RNA structure predictions. The purpose of this paper is to present a novel approach allowing for generation of a parallel tiled Nussinov RNA loop nest exposing significantly higher performance than that of known related code. This effect is achieved due to improving code locality and calculation parallelization. In order to improve code locality, we apply our previously published technique of automatic loop nest tiling to all the three loops of the Nussinov loop nest. This approach first forms original rectangular 3D tiles and then corrects them to establish their validity by means of applying the transitive closure of a dependence graph. To produce parallel code, we apply the loop skewing technique to a tiled Nussinov loop nest. The technique is implemented as a part of the publicly available polyhedral source-to-source TRACO compiler. Generated code was run on modern Intel multi-core processors and coprocessors. We present the speed-up factor of generated Nussinov RNA parallel code and demonstrate that it is considerably faster than related codes in which only the two outer loops of the Nussinov loop nest are tiled.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Archer, Charles J; Blocksome, Michael A; Cernohous, Bob R
Methods, apparatuses, and computer program products for endpoint-based parallel data processing with non-blocking collective instructions in a parallel active messaging interface (`PAMI`) of a parallel computer are provided. Embodiments include establishing by a parallel application a data communications geometry, the geometry specifying a set of endpoints that are used in collective operations of the PAMI, including associating with the geometry a list of collective algorithms valid for use with the endpoints of the geometry. Embodiments also include registering in each endpoint in the geometry a dispatch callback function for a collective operation and executing without blocking, through a single onemore » of the endpoints in the geometry, an instruction for the collective operation.« less
A parallel input composite transimpedance amplifier.
Kim, D J; Kim, C
2018-01-01
A new approach to high performance current to voltage preamplifier design is presented. The design using multiple operational amplifiers (op-amps) has a parasitic capacitance compensation network and a composite amplifier topology for fast, precision, and low noise performance. The input stage consisting of a parallel linked JFET op-amps and a high-speed bipolar junction transistor (BJT) gain stage driving the output in the composite amplifier topology, cooperating with the capacitance compensation feedback network, ensures wide bandwidth stability in the presence of input capacitance above 40 nF. The design is ideal for any two-probe measurement, including high impedance transport and scanning tunneling microscopy measurements.
A parallel input composite transimpedance amplifier
NASA Astrophysics Data System (ADS)
Kim, D. J.; Kim, C.
2018-01-01
A new approach to high performance current to voltage preamplifier design is presented. The design using multiple operational amplifiers (op-amps) has a parasitic capacitance compensation network and a composite amplifier topology for fast, precision, and low noise performance. The input stage consisting of a parallel linked JFET op-amps and a high-speed bipolar junction transistor (BJT) gain stage driving the output in the composite amplifier topology, cooperating with the capacitance compensation feedback network, ensures wide bandwidth stability in the presence of input capacitance above 40 nF. The design is ideal for any two-probe measurement, including high impedance transport and scanning tunneling microscopy measurements.
NASA Astrophysics Data System (ADS)
Blume, H.; Alexandru, R.; Applegate, R.; Giordano, T.; Kamiya, K.; Kresina, R.
1986-06-01
In a digital diagnostic imaging department, the majority of operations for handling and processing of images can be grouped into a small set of basic operations, such as image data buffering and storage, image processing and analysis, image display, image data transmission and image data compression. These operations occur in almost all nodes of the diagnostic imaging communications network of the department. An image processor architecture was developed in which each of these functions has been mapped into hardware and software modules. The modular approach has advantages in terms of economics, service, expandability and upgradeability. The architectural design is based on the principles of hierarchical functionality, distributed and parallel processing and aims at real time response. Parallel processing and real time response is facilitated in part by a dual bus system: a VME control bus and a high speed image data bus, consisting of 8 independent parallel 16-bit busses, capable of handling combined up to 144 MBytes/sec. The presented image processor is versatile enough to meet the video rate processing needs of digital subtraction angiography, the large pixel matrix processing requirements of static projection radiography, or the broad range of manipulation and display needs of a multi-modality diagnostic work station. Several hardware modules are described in detail. For illustrating the capabilities of the image processor, processed 2000 x 2000 pixel computed radiographs are shown and estimated computation times for executing the processing opera-tions are presented.
Accelerating Pathology Image Data Cross-Comparison on CPU-GPU Hybrid Systems
Wang, Kaibo; Huai, Yin; Lee, Rubao; Wang, Fusheng; Zhang, Xiaodong; Saltz, Joel H.
2012-01-01
As an important application of spatial databases in pathology imaging analysis, cross-comparing the spatial boundaries of a huge amount of segmented micro-anatomic objects demands extremely data- and compute-intensive operations, requiring high throughput at an affordable cost. However, the performance of spatial database systems has not been satisfactory since their implementations of spatial operations cannot fully utilize the power of modern parallel hardware. In this paper, we provide a customized software solution that exploits GPUs and multi-core CPUs to accelerate spatial cross-comparison in a cost-effective way. Our solution consists of an efficient GPU algorithm and a pipelined system framework with task migration support. Extensive experiments with real-world data sets demonstrate the effectiveness of our solution, which improves the performance of spatial cross-comparison by over 18 times compared with a parallelized spatial database approach. PMID:23355955
Simultaneous G-Quadruplex DNA Logic.
Bader, Antoine; Cockroft, Scott L
2018-04-03
A fundamental principle of digital computer operation is Boolean logic, where inputs and outputs are described by binary integer voltages. Similarly, inputs and outputs may be processed on the molecular level as exemplified by synthetic circuits that exploit the programmability of DNA base-pairing. Unlike modern computers, which execute large numbers of logic gates in parallel, most implementations of molecular logic have been limited to single computing tasks, or sensing applications. This work reports three G-quadruplex-based logic gates that operate simultaneously in a single reaction vessel. The gates respond to unique Boolean DNA inputs by undergoing topological conversion from duplex to G-quadruplex states that were resolved using a thioflavin T dye and gel electrophoresis. The modular, addressable, and label-free approach could be incorporated into DNA-based sensors, or used for resolving and debugging parallel processes in DNA computing applications. © 2018 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.
A novel parallel architecture for local histogram equalization
NASA Astrophysics Data System (ADS)
Ohannessian, Mesrob I.; Choueiter, Ghinwa F.; Diab, Hassan
2005-07-01
Local histogram equalization is an image enhancement algorithm that has found wide application in the pre-processing stage of areas such as computer vision, pattern recognition and medical imaging. The computationally intensive nature of the procedure, however, is a main limitation when real time interactive applications are in question. This work explores the possibility of performing parallel local histogram equalization, using an array of special purpose elementary processors, through an HDL implementation that targets FPGA or ASIC platforms. A novel parallelization scheme is presented and the corresponding architecture is derived. The algorithm is reduced to pixel-level operations. Processing elements are assigned image blocks, to maintain a reasonable performance-cost ratio. To further simplify both processor and memory organizations, a bit-serial access scheme is used. A brief performance assessment is provided to illustrate and quantify the merit of the approach.
Full Parallel Implementation of an All-Electron Four-Component Dirac-Kohn-Sham Program.
Rampino, Sergio; Belpassi, Leonardo; Tarantelli, Francesco; Storchi, Loriano
2014-09-09
A full distributed-memory implementation of the Dirac-Kohn-Sham (DKS) module of the program BERTHA (Belpassi et al., Phys. Chem. Chem. Phys. 2011, 13, 12368-12394) is presented, where the self-consistent field (SCF) procedure is replicated on all the parallel processes, each process working on subsets of the global matrices. The key feature of the implementation is an efficient procedure for switching between two matrix distribution schemes, one (integral-driven) optimal for the parallel computation of the matrix elements and another (block-cyclic) optimal for the parallel linear algebra operations. This approach, making both CPU-time and memory scalable with the number of processors used, virtually overcomes at once both time and memory barriers associated with DKS calculations. Performance, portability, and numerical stability of the code are illustrated on the basis of test calculations on three gold clusters of increasing size, an organometallic compound, and a perovskite model. The calculations are performed on a Beowulf and a BlueGene/Q system.
National Centers for Environmental Prediction
/ VISION | About EMC EMC > NAM > EXPERIMENTAL DATA Home NAM Operational Products HIRESW Operational Products Operational Forecast Graphics Experimental Forecast Graphics Verification and Diagnostics Model PARALLEL/EXPERIMENTAL MODEL FORECAST GRAPHICS OPERATIONAL VERIFICATION / DIAGNOSTICS PARALLEL VERIFICATION
A Parallel Processing Algorithm for Remote Sensing Classification
NASA Technical Reports Server (NTRS)
Gualtieri, J. Anthony
2005-01-01
A current thread in parallel computation is the use of cluster computers created by networking a few to thousands of commodity general-purpose workstation-level commuters using the Linux operating system. For example on the Medusa cluster at NASA/GSFC, this provides for super computing performance, 130 G(sub flops) (Linpack Benchmark) at moderate cost, $370K. However, to be useful for scientific computing in the area of Earth science, issues of ease of programming, access to existing scientific libraries, and portability of existing code need to be considered. In this paper, I address these issues in the context of tools for rendering earth science remote sensing data into useful products. In particular, I focus on a problem that can be decomposed into a set of independent tasks, which on a serial computer would be performed sequentially, but with a cluster computer can be performed in parallel, giving an obvious speedup. To make the ideas concrete, I consider the problem of classifying hyperspectral imagery where some ground truth is available to train the classifier. In particular I will use the Support Vector Machine (SVM) approach as applied to hyperspectral imagery. The approach will be to introduce notions about parallel computation and then to restrict the development to the SVM problem. Pseudocode (an outline of the computation) will be described and then details specific to the implementation will be given. Then timing results will be reported to show what speedups are possible using parallel computation. The paper will close with a discussion of the results.
Tactical physical preparation: the case for a movement-based approach.
Kechijian, Doug; Rush, Stephen
2012-01-01
Progressive injury prevention and physical preparation programs are needed in military special operations to optimize mission success and Operator quality of life and longevity. While physical risk is inherent in Special Operations, non-traumatic injuries resulting from overuse, poor biomechanics, and arbitrary exercise selection can be alleviated with proper medical care and patient education. An integrated approach to physical readiness that recognizes the continuity between rehabilitation and performance training is advocated to ensure that physiological adaptations do not come at the expense of orthopedic health or movement proficiency. Movement quality should be regularly evaluated and enforced throughout the training process to minimize preventable injuries and avoid undermining previous rehabilitative care. While fitness and proper movement are not substitutes for Operator specific tasks, they are foundational to many tactically-relevant skills. In light of how much is at stake, sports medicine care in the military, especially special operations, should parallel that which is practiced in professional and collegiate athletics. 2012.
National Centers for Environmental Prediction
Products Operational Forecast Graphics Experimental Forecast Graphics Verification and Diagnostics Model PARALLEL/EXPERIMENTAL MODEL FORECAST GRAPHICS OPERATIONAL VERIFICATION / DIAGNOSTICS PARALLEL VERIFICATION Developmental Air Quality Forecasts and Verification Back to Table of Contents 2. PARALLEL/EXPERIMENTAL GRAPHICS
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nishioka, K.; Nakamura, Y.; Nishimura, S.
A moment approach to calculate neoclassical transport in non-axisymmetric torus plasmas composed of multiple ion species is extended to include the external parallel momentum sources due to unbalanced tangential neutral beam injections (NBIs). The momentum sources that are included in the parallel momentum balance are calculated from the collision operators of background particles with fast ions. This method is applied for the clarification of the physical mechanism of the neoclassical parallel ion flows and the multi-ion species effect on them in Heliotron J NBI plasmas. It is found that parallel ion flow can be determined by the balance between themore » parallel viscosity and the external momentum source in the region where the external source is much larger than the thermodynamic force driven source in the collisional plasmas. This is because the friction between C{sup 6+} and D{sup +} prevents a large difference between C{sup 6+} and D{sup +} flow velocities in such plasmas. The C{sup 6+} flow velocities, which are measured by the charge exchange recombination spectroscopy system, are numerically evaluated with this method. It is shown that the experimentally measured C{sup 6+} impurity flow velocities do not contradict clearly with the neoclassical estimations, and the dependence of parallel flow velocities on the magnetic field ripples is consistent in both results.« less
Engineering approach for cost effective operation of industrial pump systems
NASA Astrophysics Data System (ADS)
Krickis, O.; Oleksijs, R.
2017-10-01
Power plants operators are persuaded to operate the main equipment such as centrifugal pumps in economically effective way. The operation of pump sets of district heating network at power plants should be done according to prescriptions of the original equipment manufacturer with further implementation of these requirements to distributed control system of the plant. In order to operate industrial pump sets with a small number of malfunctions is necessary to control the duty point of pump sets in H-Q coordinates, which could be complex task in some installations. Alternatively, pump operation control could be organized in H-n (head vs rpm) coordinates, utilizing pressure transmitters in pressure pipeline and value of rpm from variable speed driver. Safe operation range of the pump has to be limited with system parabolas, which prevents the duty point location outside of the predefined operation area. The particular study demonstrates the engineering approach for pump’s safe operation control development in MATLAB/Simulink environment, which allows to simulate the operation of the pump at different capacities in hydraulic system with variable characteristic and to predefine the conditions for efficient simultaneous pump operation in parallel connection.
Ensemble Sampling vs. Time Sampling in Molecular Dynamics Simulations of Thermal Conductivity
Gordiz, Kiarash; Singh, David J.; Henry, Asegun
2015-01-29
In this report we compare time sampling and ensemble averaging as two different methods available for phase space sampling. For the comparison, we calculate thermal conductivities of solid argon and silicon structures, using equilibrium molecular dynamics. We introduce two different schemes for the ensemble averaging approach, and show that both can reduce the total simulation time as compared to time averaging. It is also found that velocity rescaling is an efficient mechanism for phase space exploration. Although our methodology is tested using classical molecular dynamics, the ensemble generation approaches may find their greatest utility in computationally expensive simulations such asmore » first principles molecular dynamics. For such simulations, where each time step is costly, time sampling can require long simulation times because each time step must be evaluated sequentially and therefore phase space averaging is achieved through sequential operations. On the other hand, with ensemble averaging, phase space sampling can be achieved through parallel operations, since each ensemble is independent. For this reason, particularly when using massively parallel architectures, ensemble sampling can result in much shorter simulation times and exhibits similar overall computational effort.« less
Assessment of compatibility of ICRF antenna operation with full W wall in ASDEX Upgrade
NASA Astrophysics Data System (ADS)
Bobkov, Vl. V.; Braun, F.; Dux, R.; Herrmann, A.; Giannone, L.; Kallenbach, A.; Krivska, A.; Müller, H. W.; Neu, R.; Noterdaeme, J.-M.; Pütterich, T.; Rohde, V.; Schweinzer, J.; Sips, A.; Zammuto, I.; ASDEX Upgrade Team
2010-03-01
The compatibility of ICRF (ion cyclotron range of frequencies) antenna operation with high-Z plasma facing components is assessed in ASDEX Upgrade (AUG) with its tungsten (W) first wall. The mechanism of ICRF-related W sputtering was studied by various diagnostics including the local spectroscopic measurements of W sputtering yield YW on antenna limiters. Modification of one antenna with triangular shields, which cover the locations where long magnetic field lines pass only one out of two (0π)-phased antenna straps, did not influence the locally measured YW values markedly. In the experiments with antennas powered individually, poloidal profiles of YW on limiters of powered antennas show high YW close to the equatorial plane and at the very edge of the antenna top. The YW-profile on an unpowered antenna limiter peaks at the location projecting to the top of the powered antenna. An interpretation of the YW measurements is presented, assuming a direct link between the W sputtering and the sheath driving RF voltages deduced from parallel electric near-field (E||) calculations and this suggests a strong E|| at the antenna limiters. However, uncertainties are too large to describe the YW poloidal profiles. In order to reduce ICRF-related rise in W concentration CW, an operational approach and an approach based on calculations of parallel electric fields with new antenna designs are considered. In the operation, a noticeable reduction in YW and CW in the plasma during ICRF operation with W wall can be achieved by (a) increasing plasma-antenna clearance; (b) strong gas puffing; (c) decreasing the intrinsic light impurity content (mainly oxygen and carbon in AUG). In calculations, which take into account a realistic antenna geometry, the high E|| fields at the antenna limiters are reduced in several ways: (a) by extending the antenna box and the surrounding structures parallel to the magnetic field; (b) by increasing the average strap-box distance, e.g. by increasing the number of toroidally distributed straps; (c) by a better balance of (0π)-phased contributions to RF image currents.
Multisensory architectures for action-oriented perception
NASA Astrophysics Data System (ADS)
Alba, L.; Arena, P.; De Fiore, S.; Listán, J.; Patané, L.; Salem, A.; Scordino, G.; Webb, B.
2007-05-01
In order to solve the navigation problem of a mobile robot in an unstructured environment a versatile sensory system and efficient locomotion control algorithms are necessary. In this paper an innovative sensory system for action-oriented perception applied to a legged robot is presented. An important problem we address is how to utilize a large variety and number of sensors, while having systems that can operate in real time. Our solution is to use sensory systems that incorporate analog and parallel processing, inspired by biological systems, to reduce the required data exchange with the motor control layer. In particular, as concerns the visual system, we use the Eye-RIS v1.1 board made by Anafocus, which is based on a fully parallel mixed-signal array sensor-processor chip. The hearing sensor is inspired by the cricket hearing system and allows efficient localization of a specific sound source with a very simple analog circuit. Our robot utilizes additional sensors for touch, posture, load, distance, and heading, and thus requires customized and parallel processing for concurrent acquisition. Therefore a Field Programmable Gate Array (FPGA) based hardware was used to manage the multi-sensory acquisition and processing. This choice was made because FPGAs permit the implementation of customized digital logic blocks that can operate in parallel allowing the sensors to be driven simultaneously. With this approach the multi-sensory architecture proposed can achieve real time capabilities.
Ha, S; Matej, S; Ispiryan, M; Mueller, K
2013-02-01
We describe a GPU-accelerated framework that efficiently models spatially (shift) variant system response kernels and performs forward- and back-projection operations with these kernels for the DIRECT (Direct Image Reconstruction for TOF) iterative reconstruction approach. Inherent challenges arise from the poor memory cache performance at non-axis aligned TOF directions. Focusing on the GPU memory access patterns, we utilize different kinds of GPU memory according to these patterns in order to maximize the memory cache performance. We also exploit the GPU instruction-level parallelism to efficiently hide long latencies from the memory operations. Our experiments indicate that our GPU implementation of the projection operators has slightly faster or approximately comparable time performance than FFT-based approaches using state-of-the-art FFTW routines. However, most importantly, our GPU framework can also efficiently handle any generic system response kernels, such as spatially symmetric and shift-variant as well as spatially asymmetric and shift-variant, both of which an FFT-based approach cannot cope with.
NASA Astrophysics Data System (ADS)
Ha, S.; Matej, S.; Ispiryan, M.; Mueller, K.
2013-02-01
We describe a GPU-accelerated framework that efficiently models spatially (shift) variant system response kernels and performs forward- and back-projection operations with these kernels for the DIRECT (Direct Image Reconstruction for TOF) iterative reconstruction approach. Inherent challenges arise from the poor memory cache performance at non-axis aligned TOF directions. Focusing on the GPU memory access patterns, we utilize different kinds of GPU memory according to these patterns in order to maximize the memory cache performance. We also exploit the GPU instruction-level parallelism to efficiently hide long latencies from the memory operations. Our experiments indicate that our GPU implementation of the projection operators has slightly faster or approximately comparable time performance than FFT-based approaches using state-of-the-art FFTW routines. However, most importantly, our GPU framework can also efficiently handle any generic system response kernels, such as spatially symmetric and shift-variant as well as spatially asymmetric and shift-variant, both of which an FFT-based approach cannot cope with.
Bilingual parallel programming
DOE Office of Scientific and Technical Information (OSTI.GOV)
Foster, I.; Overbeek, R.
1990-01-01
Numerous experiments have demonstrated that computationally intensive algorithms support adequate parallelism to exploit the potential of large parallel machines. Yet successful parallel implementations of serious applications are rare. The limiting factor is clearly programming technology. None of the approaches to parallel programming that have been proposed to date -- whether parallelizing compilers, language extensions, or new concurrent languages -- seem to adequately address the central problems of portability, expressiveness, efficiency, and compatibility with existing software. In this paper, we advocate an alternative approach to parallel programming based on what we call bilingual programming. We present evidence that this approach providesmore » and effective solution to parallel programming problems. The key idea in bilingual programming is to construct the upper levels of applications in a high-level language while coding selected low-level components in low-level languages. This approach permits the advantages of a high-level notation (expressiveness, elegance, conciseness) to be obtained without the cost in performance normally associated with high-level approaches. In addition, it provides a natural framework for reusing existing code.« less
Airborne Precision Spacing for Dependent Parallel Operations Interface Study
NASA Technical Reports Server (NTRS)
Volk, Paul M.; Takallu, M. A.; Hoffler, Keith D.; Weiser, Jarold; Turner, Dexter
2012-01-01
This paper describes a usability study of proposed cockpit interfaces to support Airborne Precision Spacing (APS) operations for aircraft performing dependent parallel approaches (DPA). NASA has proposed an airborne system called Pair Dependent Speed (PDS) which uses their Airborne Spacing for Terminal Arrival Routes (ASTAR) algorithm to manage spacing intervals. Interface elements were designed to facilitate the input of APS-DPA spacing parameters to ASTAR, and to convey PDS system information to the crew deemed necessary and/or helpful to conduct the operation, including: target speed, guidance mode, target aircraft depiction, and spacing trend indication. In the study, subject pilots observed recorded simulations using the proposed interface elements in which the ownship managed assigned spacing intervals from two other arriving aircraft. Simulations were recorded using the Aircraft Simulation for Traffic Operations Research (ASTOR) platform, a medium-fidelity simulator based on a modern Boeing commercial glass cockpit. Various combinations of the interface elements were presented to subject pilots, and feedback was collected via structured questionnaires. The results of subject pilot evaluations show that the proposed design elements were acceptable, and that preferable combinations exist within this set of elements. The results also point to potential improvements to be considered for implementation in future experiments.
NASA Astrophysics Data System (ADS)
Cruz Jiménez, Miriam Guadalupe; Meyer Baese, Uwe; Jovanovic Dolecek, Gordana
2017-12-01
New theoretical lower bounds for the number of operators needed in fixed-point constant multiplication blocks are presented. The multipliers are constructed with the shift-and-add approach, where every arithmetic operation is pipelined, and with the generalization that n-input pipelined additions/subtractions are allowed, along with pure pipelining registers. These lower bounds, tighter than the state-of-the-art theoretical limits, are particularly useful in early design stages for a quick assessment in the hardware utilization of low-cost constant multiplication blocks implemented in the newest families of field programmable gate array (FPGA) integrated circuits.
Archer, Charles J; Blocksome, Michael E; Ratterman, Joseph D; Smith, Brian E
2014-02-11
Endpoint-based parallel data processing in a parallel active messaging interface ('PAMI') of a parallel computer, the PAMI composed of data communications endpoints, each endpoint including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task, the compute nodes coupled for data communications through the PAMI, including establishing a data communications geometry, the geometry specifying, for tasks representing processes of execution of the parallel application, a set of endpoints that are used in collective operations of the PAMI including a plurality of endpoints for one of the tasks; receiving in endpoints of the geometry an instruction for a collective operation; and executing the instruction for a collective opeartion through the endpoints in dependence upon the geometry, including dividing data communications operations among the plurality of endpoints for one of the tasks.
Xray: N-dimensional, labeled arrays for analyzing physical datasets in Python
NASA Astrophysics Data System (ADS)
Hoyer, S.
2015-12-01
Efficient analysis of geophysical datasets requires tools that both preserve and utilize metadata, and that transparently scale to process large datas. Xray is such a tool, in the form of an open source Python library for analyzing the labeled, multi-dimensional array (tensor) datasets that are ubiquitous in the Earth sciences. Xray's approach pairs Python data structures based on the data model of the netCDF file format with the proven design and user interface of pandas, the popular Python data analysis library for labeled tabular data. On top of the NumPy array, xray adds labeled dimensions (e.g., "time") and coordinate values (e.g., "2015-04-10"), which it uses to enable a host of operations powered by these labels: selection, aggregation, alignment, broadcasting, split-apply-combine, interoperability with pandas and serialization to netCDF/HDF5. Many of these operations are enabled by xray's tight integration with pandas. Finally, to allow for easy parallelism and to enable its labeled data operations to scale to datasets that does not fit into memory, xray integrates with the parallel processing library dask.
Feasibility study for the implementation of NASTRAN on the ILLIAC 4 parallel processor
NASA Technical Reports Server (NTRS)
Field, E. I.
1975-01-01
The ILLIAC IV, a fourth generation multiprocessor using parallel processing hardware concepts, is operational at Moffett Field, California. Its capability to excel at matrix manipulation, makes the ILLIAC well suited for performing structural analyses using the finite element displacement method. The feasibility of modifying the NASTRAN (NASA structural analysis) computer program to make effective use of the ILLIAC IV was investigated. The characteristics are summarized of the ILLIAC and the ARPANET, a telecommunications network which spans the continent making the ILLIAC accessible to nearly all major industrial centers in the United States. Two distinct approaches are studied: retaining NASTRAN as it now operates on many of the host computers of the ARPANET to process the input and output while using the ILLIAC only for the major computational tasks, and installing NASTRAN to operate entirely in the ILLIAC environment. Though both alternatives offer similar and significant increases in computational speed over modern third generation processors, the full installation of NASTRAN on the ILLIAC is recommended. Specifications are presented for performing that task with manpower estimates and schedules to correspond.
The parallel algorithm for the 2D discrete wavelet transform
NASA Astrophysics Data System (ADS)
Barina, David; Najman, Pavel; Kleparnik, Petr; Kula, Michal; Zemcik, Pavel
2018-04-01
The discrete wavelet transform can be found at the heart of many image-processing algorithms. Until now, the transform on general-purpose processors (CPUs) was mostly computed using a separable lifting scheme. As the lifting scheme consists of a small number of operations, it is preferred for processing using single-core CPUs. However, considering a parallel processing using multi-core processors, this scheme is inappropriate due to a large number of steps. On such architectures, the number of steps corresponds to the number of points that represent the exchange of data. Consequently, these points often form a performance bottleneck. Our approach appropriately rearranges calculations inside the transform, and thereby reduces the number of steps. In other words, we propose a new scheme that is friendly to parallel environments. When evaluating on multi-core CPUs, we consistently overcome the original lifting scheme. The evaluation was performed on 61-core Intel Xeon Phi and 8-core Intel Xeon processors.
Domain Decomposition By the Advancing-Partition Method for Parallel Unstructured Grid Generation
NASA Technical Reports Server (NTRS)
Pirzadeh, Shahyar Z.; Zagaris, George
2009-01-01
A new method of domain decomposition has been developed for generating unstructured grids in subdomains either sequentially or using multiple computers in parallel. Domain decomposition is a crucial and challenging step for parallel grid generation. Prior methods are generally based on auxiliary, complex, and computationally intensive operations for defining partition interfaces and usually produce grids of lower quality than those generated in single domains. The new technique, referred to as "Advancing Partition," is based on the Advancing-Front method, which partitions a domain as part of the volume mesh generation in a consistent and "natural" way. The benefits of this approach are: 1) the process of domain decomposition is highly automated, 2) partitioning of domain does not compromise the quality of the generated grids, and 3) the computational overhead for domain decomposition is minimal. The new method has been implemented in NASA's unstructured grid generation code VGRID.
NASA Astrophysics Data System (ADS)
Kuznetsov, P. A.; Kovalev, I. V.; Losev, V. V.; Kalinin, A. O.; Murygin, A. V.
2016-04-01
The article discusses the reliability of automated control systems. Analyzes the approach to the classification systems for health States. This approach can be as traditional binary approach, operating with the concept of "serviceability", and other variants of estimation of the system state. This article provides one such option, providing selective evaluation of components for the reliability of the entire system. Introduced description of various automatic control systems and their elements from the point of view of health and risk, mathematical method of determining the transition object from state to state, they differ from each other in the implementation of the objective function. Explores the interplay of elements in different States, the aggregate state of the elements connected in series or in parallel. Are the tables of various logic States and the principles of their calculation in series and parallel connection. Through simulation the proposed approach is illustrated by finding the probability of getting into the system state data in parallel and serially connected elements, with their different probabilities of moving from state to state. In general, the materials of article will be useful for analyzing of the reliability the automated control systems and engineering of the highly-reliable systems. Thus, this mechanism to determine the State of the system provides more detailed information about it and allows a selective approach to the reliability of the system as a whole. Detailed results when assessing the reliability of the automated control systems allows the engineer to make an informed decision when designing means of improving reliability.
Increased Energy Delivery for Parallel Battery Packs with No Regulated Bus
NASA Astrophysics Data System (ADS)
Hsu, Chung-Ti
In this dissertation, a new approach to paralleling different battery types is presented. A method for controlling charging/discharging of different battery packs by using low-cost bi-directional switches instead of DC-DC converters is proposed. The proposed system architecture, algorithms, and control techniques allow batteries with different chemistry, voltage, and SOC to be properly charged and discharged in parallel without causing safety problems. The physical design and cost for the energy management system is substantially reduced. Additionally, specific types of failures in the maximum power point tracking (MPPT) in a photovoltaic (PV) system when tracking only the load current of a DC-DC converter are analyzed. The periodic nonlinear load current will lead MPPT realized by the conventional perturb and observe (P&O) algorithm to be problematic. A modified MPPT algorithm is proposed and it still only requires typically measured signals, yet is suitable for both linear and periodic nonlinear loads. Moreover, for a modular DC-DC converter using several converters in parallel, the input power from PV panels is processed and distributed at the module level. Methods for properly implementing distributed MPPT are studied. A new approach to efficient MPPT under partial shading conditions is presented. The power stage architecture achieves fast input current change rate by combining a current-adjustable converter with a few converters operating at a constant current.
Philips, Patrick J.; Stinson, Beverley; Zaugg, Steven D.; Furlong, Edward T.; Kolpin, Dana W.; Esposito, Kathleen; Bodniewicz, B.; Pape, R.; Anderson, J.
2005-01-01
The second phase of the study focused on one of the most common wastewater treatment processes operated in the United States, the Activated Sludge process. Using four controlled parallel activated sludge pilots, a more detailed assessment of the impact of Sludge Retention Time (SRT) on the reduction or removal of ECs was performed.
Synthesizing parallel imaging applications using the CAP (computer-aided parallelization) tool
NASA Astrophysics Data System (ADS)
Gennart, Benoit A.; Mazzariol, Marc; Messerli, Vincent; Hersch, Roger D.
1997-12-01
Imaging applications such as filtering, image transforms and compression/decompression require vast amounts of computing power when applied to large data sets. These applications would potentially benefit from the use of parallel processing. However, dedicated parallel computers are expensive and their processing power per node lags behind that of the most recent commodity components. Furthermore, developing parallel applications remains a difficult task: writing and debugging the application is difficult (deadlocks), programs may not be portable from one parallel architecture to the other, and performance often comes short of expectations. In order to facilitate the development of parallel applications, we propose the CAP computer-aided parallelization tool which enables application programmers to specify at a high-level of abstraction the flow of data between pipelined-parallel operations. In addition, the CAP tool supports the programmer in developing parallel imaging and storage operations. CAP enables combining efficiently parallel storage access routines and image processing sequential operations. This paper shows how processing and I/O intensive imaging applications must be implemented to take advantage of parallelism and pipelining between data access and processing. This paper's contribution is (1) to show how such implementations can be compactly specified in CAP, and (2) to demonstrate that CAP specified applications achieve the performance of custom parallel code. The paper analyzes theoretically the performance of CAP specified applications and demonstrates the accuracy of the theoretical analysis through experimental measurements.
NASA Astrophysics Data System (ADS)
Degtyarev, Alexander; Khramushin, Vasily
2016-02-01
The paper deals with the computer implementation of direct computational experiments in fluid mechanics, constructed on the basis of the approach developed by the authors. The proposed approach allows the use of explicit numerical scheme, which is an important condition for increasing the effciency of the algorithms developed by numerical procedures with natural parallelism. The paper examines the main objects and operations that let you manage computational experiments and monitor the status of the computation process. Special attention is given to a) realization of tensor representations of numerical schemes for direct simulation; b) realization of representation of large particles of a continuous medium motion in two coordinate systems (global and mobile); c) computing operations in the projections of coordinate systems, direct and inverse transformation in these systems. Particular attention is paid to the use of hardware and software of modern computer systems.
Control and protection system for paralleled modular static inverter-converter systems
NASA Technical Reports Server (NTRS)
Birchenough, A. G.; Gourash, F.
1973-01-01
A control and protection system was developed for use with a paralleled 2.5-kWe-per-module static inverter-converter system. The control and protection system senses internal and external fault parameters such as voltage, frequency, current, and paralleling current unbalance. A logic system controls contactors to isolate defective power conditioners or loads. The system sequences contactor operation to automatically control parallel operation, startup, and fault isolation. Transient overload protection and fault checking sequences are included. The operation and performance of a control and protection system, with detailed circuit descriptions, are presented.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chu, T.L.; Musicki, Z.; Kohut, P.
1994-06-01
During 1989, the Nuclear Regulatory Commission (NRC) initiated an extensive program to carefully examine the potential risks during low power and shutdown operations. The program includes two parallel projects being performed by Brookhaven National Laboratory (BNL) and Sandia National Laboratories (SNL). Two plants, Surry (pressurized water reactor) and Grand Gulf (boiling water reactor), were selected as the plants to be studied. The objectives of the program are to assess the risks of severe accidents initiated during plant operational states other than full power operation and to compare the estimated core damage frequencies, important accident sequences and other qualitative and quantitativemore » results with those accidents initiated during full power operation as assessed in NUREG-1150. The objective of this report is to document the approach utilized in the Surry plant and discuss the results obtained. A parallel report for the Grand Gulf plant is prepared by SNL. This study shows that the core-damage frequency during mid-loop operation at the Surry plant is comparable to that of power operation. The authors recognize that there is very large uncertainty in the human error probabilities in this study. This study identified that only a few procedures are available for mitigating accidents that may occur during shutdown. Procedures written specifically for shutdown accidents would be useful.« less
A Process Algebraic Approach to Software Architecture Design
NASA Astrophysics Data System (ADS)
Aldini, Alessandro; Bernardo, Marco; Corradini, Flavio
Process algebra is a formal tool for the specification and the verification of concurrent and distributed systems. It supports compositional modeling through a set of operators able to express concepts like sequential composition, alternative composition, and parallel composition of action-based descriptions. It also supports mathematical reasoning via a two-level semantics, which formalizes the behavior of a description by means of an abstract machine obtained from the application of structural operational rules and then introduces behavioral equivalences able to relate descriptions that are syntactically different. In this chapter, we present the typical behavioral operators and operational semantic rules for a process calculus in which no notion of time, probability, or priority is associated with actions. Then, we discuss the three most studied approaches to the definition of behavioral equivalences - bisimulation, testing, and trace - and we illustrate their congruence properties, sound and complete axiomatizations, modal logic characterizations, and verification algorithms. Finally, we show how these behavioral equivalences and some of their variants are related to each other on the basis of their discriminating power.
Portability and Cross-Platform Performance of an MPI-Based Parallel Polygon Renderer
NASA Technical Reports Server (NTRS)
Crockett, Thomas W.
1999-01-01
Visualizing the results of computations performed on large-scale parallel computers is a challenging problem, due to the size of the datasets involved. One approach is to perform the visualization and graphics operations in place, exploiting the available parallelism to obtain the necessary rendering performance. Over the past several years, we have been developing algorithms and software to support visualization applications on NASA's parallel supercomputers. Our results have been incorporated into a parallel polygon rendering system called PGL. PGL was initially developed on tightly-coupled distributed-memory message-passing systems, including Intel's iPSC/860 and Paragon, and IBM's SP2. Over the past year, we have ported it to a variety of additional platforms, including the HP Exemplar, SGI Origin2OOO, Cray T3E, and clusters of Sun workstations. In implementing PGL, we have had two primary goals: cross-platform portability and high performance. Portability is important because (1) our manpower resources are limited, making it difficult to develop and maintain multiple versions of the code, and (2) NASA's complement of parallel computing platforms is diverse and subject to frequent change. Performance is important in delivering adequate rendering rates for complex scenes and ensuring that parallel computing resources are used effectively. Unfortunately, these two goals are often at odds. In this paper we report on our experiences with portability and performance of the PGL polygon renderer across a range of parallel computing platforms.
Parallel asynchronous systems and image processing algorithms
NASA Technical Reports Server (NTRS)
Coon, D. D.; Perera, A. G. U.
1989-01-01
A new hardware approach to implementation of image processing algorithms is described. The approach is based on silicon devices which would permit an independent analog processing channel to be dedicated to evey pixel. A laminar architecture consisting of a stack of planar arrays of the device would form a two-dimensional array processor with a 2-D array of inputs located directly behind a focal plane detector array. A 2-D image data stream would propagate in neuronlike asynchronous pulse coded form through the laminar processor. Such systems would integrate image acquisition and image processing. Acquisition and processing would be performed concurrently as in natural vision systems. The research is aimed at implementation of algorithms, such as the intensity dependent summation algorithm and pyramid processing structures, which are motivated by the operation of natural vision systems. Implementation of natural vision algorithms would benefit from the use of neuronlike information coding and the laminar, 2-D parallel, vision system type architecture. Besides providing a neural network framework for implementation of natural vision algorithms, a 2-D parallel approach could eliminate the serial bottleneck of conventional processing systems. Conversion to serial format would occur only after raw intensity data has been substantially processed. An interesting challenge arises from the fact that the mathematical formulation of natural vision algorithms does not specify the means of implementation, so that hardware implementation poses intriguing questions involving vision science.
PUP: An Architecture to Exploit Parallel Unification in Prolog
1988-03-01
environment stacking mo del similar to the Warren Abstract Machine [23] since it has been shown to be super ior to other known models (see [21]). The storage...execute in groups of independent operations. Unifications belonging to different group s may not overlap. Also unification operations belonging to the...since all parallel operations on the unification units must complete before any of the units can star t executing the next group of parallel
Parallel Algorithms and Patterns
DOE Office of Scientific and Technical Information (OSTI.GOV)
Robey, Robert W.
2016-06-16
This is a powerpoint presentation on parallel algorithms and patterns. A parallel algorithm is a well-defined, step-by-step computational procedure that emphasizes concurrency to solve a problem. Examples of problems include: Sorting, searching, optimization, matrix operations. A parallel pattern is a computational step in a sequence of independent, potentially concurrent operations that occurs in diverse scenarios with some frequency. Examples are: Reductions, prefix scans, ghost cell updates. We only touch on parallel patterns in this presentation. It really deserves its own detailed discussion which Gabe Rockefeller would like to develop.
Multiple curved descending approaches and the air traffic control problem
NASA Technical Reports Server (NTRS)
Hart, S. G.; Mcpherson, D.; Kreifeldt, J.; Wemple, T. E.
1977-01-01
A terminal area air traffic control simulation was designed to study ways of accommodating increased air traffic density. The concepts that were investigated assumed the availability of the microwave landing system and data link and included: (1) multiple curved descending final approaches; (2) parallel runways certified for independent and simultaneous operation under IFR conditions; (3) closer spacing between successive aircraft; and (4) a distributed management system between the air and ground. Three groups each consisting of three pilots and two air traffic controllers flew a combined total of 350 approaches. Piloted simulators were supplied with computer generated traffic situation displays and flight instruments. The controllers were supplied with a terminal area map and digital status information. Pilots and controllers also reported that the distributed management procedure was somewhat more safe and orderly than the centralized management procedure. Flying precision increased as the amount of turn required to intersect the outer mark decreased. Pilots reported that they preferred the alternative of multiple curved descending approaches with wider spacing between aircraft to closer spacing on single, straight in finals while controllers preferred the latter option. Both pilots and controllers felt that parallel runways are an acceptable way to accommodate increased traffic density safely and expeditiously.
Automatic recognition of vector and parallel operations in a higher level language
NASA Technical Reports Server (NTRS)
Schneck, P. B.
1971-01-01
A compiler for recognizing statements of a FORTRAN program which are suited for fast execution on a parallel or pipeline machine such as Illiac-4, Star or ASC is described. The technique employs interval analysis to provide flow information to the vector/parallel recognizer. Where profitable the compiler changes scalar variables to subscripted variables. The output of the compiler is an extension to FORTRAN which shows parallel and vector operations explicitly.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Archer, Charles J; Blocksome, Michael A; Cernohous, Bob R
Endpoint-based parallel data processing with non-blocking collective instructions in a PAMI of a parallel computer is disclosed. The PAMI is composed of data communications endpoints, each including a specification of data communications parameters for a thread of execution on a compute node, including specifications of a client, a context, and a task. The compute nodes are coupled for data communications through the PAMI. The parallel application establishes a data communications geometry specifying a set of endpoints that are used in collective operations of the PAMI by associating with the geometry a list of collective algorithms valid for use with themore » endpoints of the geometry; registering in each endpoint in the geometry a dispatch callback function for a collective operation; and executing without blocking, through a single one of the endpoints in the geometry, an instruction for the collective operation.« less
A Review of Lightweight Thread Approaches for High Performance Computing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Castello, Adrian; Pena, Antonio J.; Seo, Sangmin
High-level, directive-based solutions are becoming the programming models (PMs) of the multi/many-core architectures. Several solutions relying on operating system (OS) threads perfectly work with a moderate number of cores. However, exascale systems will spawn hundreds of thousands of threads in order to exploit their massive parallel architectures and thus conventional OS threads are too heavy for that purpose. Several lightweight thread (LWT) libraries have recently appeared offering lighter mechanisms to tackle massive concurrency. In order to examine the suitability of LWTs in high-level runtimes, we develop a set of microbenchmarks consisting of commonlyfound patterns in current parallel codes. Moreover, wemore » study the semantics offered by some LWT libraries in order to expose the similarities between different LWT application programming interfaces. This study reveals that a reduced set of LWT functions can be sufficient to cover the common parallel code patterns and that those LWT libraries perform better than OS threads-based solutions in cases where task and nested parallelism are becoming more popular with new architectures.« less
Integrated microfluidic devices for combinatorial cell-based assays.
Yu, Zeta Tak For; Kamei, Ken-ichiro; Takahashi, Hiroko; Shu, Chengyi Jenny; Wang, Xiaopu; He, George Wenfu; Silverman, Robert; Radu, Caius G; Witte, Owen N; Lee, Ki-Bum; Tseng, Hsian-Rong
2009-06-01
The development of miniaturized cell culture platforms for performing parallel cultures and combinatorial assays is important in cell biology from the single-cell level to the system level. In this paper we developed an integrated microfluidic cell-culture platform, Cell-microChip (Cell-microChip), for parallel analyses of the effects of microenvironmental cues (i.e., culture scaffolds) on different mammalian cells and their cellular responses to external stimuli. As a model study, we demonstrated the ability of culturing and assaying several mammalian cells, such as NIH 3T3 fibroblast, B16 melanoma and HeLa cell lines, in a parallel way. For functional assays, first we tested drug-induced apoptotic responses from different cell lines. As a second functional assay, we performed "on-chip" transfection of a reporter gene encoding an enhanced green fluorescent protein (EGFP) followed by live-cell imaging of transcriptional activation of cyclooxygenase 2 (Cox-2) expression. Collectively, our Cell-microChip approach demonstrated the capability to carry out parallel operations and the potential to further integrate advanced functions and applications in the broader space of combinatorial chemistry and biology.
Integrated microfluidic devices for combinatorial cell-based assays
Yu, Zeta Tak For; Kamei, Ken-ichiro; Takahashi, Hiroko; Shu, Chengyi Jenny; Wang, Xiaopu; He, George Wenfu; Silverman, Robert
2010-01-01
The development of miniaturized cell culture platforms for performing parallel cultures and combinatorial assays is important in cell biology from the single-cell level to the system level. In this paper we developed an integrated microfluidic cell-culture platform, Cell-microChip (Cell-μChip), for parallel analyses of the effects of microenvir-onmental cues (i.e., culture scaffolds) on different mammalian cells and their cellular responses to external stimuli. As a model study, we demonstrated the ability of culturing and assaying several mammalian cells, such as NIH 3T3 fibro-blast, B16 melanoma and HeLa cell lines, in a parallel way. For functional assays, first we tested drug-induced apoptotic responses from different cell lines. As a second functional assay, we performed "on-chip" transfection of a reporter gene encoding an enhanced green fluorescent protein (EGFP) followed by live-cell imaging of transcriptional activation of cyclooxygenase 2 (Cox-2) expression. Collectively, our Cell-μChip approach demonstrated the capability to carry out parallel operations and the potential to further integrate advanced functions and applications in the broader space of combinatorial chemistry and biology. PMID:19130244
Parallel operation of NH3 screw compressors - the optimum way
NASA Astrophysics Data System (ADS)
Pijnenburg, B.; Ritmann, J.
2015-08-01
The use of more smaller industrial NH3 screw compressors operating in parallel seems to offer the optimum way when it comes to fulfilling maximum part load efficiency, increased redundancy and other highly requested features in the industrial refrigeration industry today. Parallel operation in an optimum way can be selected to secure continuous operation and can in most applications be configured to ensure lower overall operating economy. New compressors are developed to meet requirements for flexibility in operation and are controlled in an intelligent way. The intelligent control system keeps focus on all external demands, but yet striving to offer always the lowest possible absorbed power, including in future scenarios with connection to smart grid.
The Microgravity Science Glovebox
NASA Technical Reports Server (NTRS)
Baugher, Charles R.; Primm, Lowell (Technical Monitor)
2001-01-01
The Microgravity Science Glovebox (MSG) provides scientific investigators the opportunity to implement interactive experiments on the International Space Station. The facility has been designed around the concept of an enclosed scientific workbench that allows the crew to assemble and operate an experimental apparatus with participation from ground-based scientists through real-time data and video links. Workbench utilities provided to operate the experiments include power, data acquisition, computer communications, vacuum, nitrogen. and specialized tools. Because the facility work area is enclosed and held at a negative pressure with respect to the crew living area, the requirements on the experiments for containment of small parts, particulates, fluids, and gasses are substantially reduced. This environment allows experiments to be constructed in close parallel with bench type investigations performed in groundbased laboratories. Such an approach enables experimental scientists to develop hardware that more closely parallel their traditional laboratory experience and transfer these experiments into meaningful space-based research. When delivered to the ISS the MSG will represent a significant scientific capability that will be continuously available for a decade of evolutionary research.
Flight Test Evaluation of the Airborne Information for Lateral Spacing (AILS) Concept
NASA Technical Reports Server (NTRS)
Abbott, Terence S.
2002-01-01
The Airborne Information for Lateral Spacing (AILS) concept is designed to support independent parallel approach operations to runways spaced as close as 2,500 feet. This report briefly describes the AILS operational concept and the results of a flight test of one implementation of this concept. The focus of this flight test experiment was to validate a prior simulator study, evaluating pilot performance, pilot acceptability, and minimum miss-distances for the rare situation in which an aircraft on one approach intrudes into the path of an aircraft on the other approach. Although the flight data set was not meant to be a statistically valid sample, the trends acquired in flight followed those of the simulator and therefore met the intent of validating the findings from the simulator. Results from this study showed that the design-goal mean miss-distance of 1,200 feet to potential collision situations was surpassed with an actual mean miss-distance of 1,859 feet. Pilot reaction times to the alerting system, which was an operational concern, averaged 0.65 seconds, were well below the design goal reaction time of 2.0 seconds. From the results of both of these tests, it can be concluded that this operational concept, with supporting technology and procedures, may provide an operationally viable means for conducting simultaneous, independent instrument approaches to runways spaced as close as 2500 ft.
Missile signal processing common computer architecture for rapid technology upgrade
NASA Astrophysics Data System (ADS)
Rabinkin, Daniel V.; Rutledge, Edward; Monticciolo, Paul
2004-10-01
Interceptor missiles process IR images to locate an intended target and guide the interceptor towards it. Signal processing requirements have increased as the sensor bandwidth increases and interceptors operate against more sophisticated targets. A typical interceptor signal processing chain is comprised of two parts. Front-end video processing operates on all pixels of the image and performs such operations as non-uniformity correction (NUC), image stabilization, frame integration and detection. Back-end target processing, which tracks and classifies targets detected in the image, performs such algorithms as Kalman tracking, spectral feature extraction and target discrimination. In the past, video processing was implemented using ASIC components or FPGAs because computation requirements exceeded the throughput of general-purpose processors. Target processing was performed using hybrid architectures that included ASICs, DSPs and general-purpose processors. The resulting systems tended to be function-specific, and required custom software development. They were developed using non-integrated toolsets and test equipment was developed along with the processor platform. The lifespan of a system utilizing the signal processing platform often spans decades, while the specialized nature of processor hardware and software makes it difficult and costly to upgrade. As a result, the signal processing systems often run on outdated technology, algorithms are difficult to update, and system effectiveness is impaired by the inability to rapidly respond to new threats. A new design approach is made possible three developments; Moore's Law - driven improvement in computational throughput; a newly introduced vector computing capability in general purpose processors; and a modern set of open interface software standards. Today's multiprocessor commercial-off-the-shelf (COTS) platforms have sufficient throughput to support interceptor signal processing requirements. This application may be programmed under existing real-time operating systems using parallel processing software libraries, resulting in highly portable code that can be rapidly migrated to new platforms as processor technology evolves. Use of standardized development tools and 3rd party software upgrades are enabled as well as rapid upgrade of processing components as improved algorithms are developed. The resulting weapon system will have a superior processing capability over a custom approach at the time of deployment as a result of a shorter development cycles and use of newer technology. The signal processing computer may be upgraded over the lifecycle of the weapon system, and can migrate between weapon system variants enabled by modification simplicity. This paper presents a reference design using the new approach that utilizes an Altivec PowerPC parallel COTS platform. It uses a VxWorks-based real-time operating system (RTOS), and application code developed using an efficient parallel vector library (PVL). A quantification of computing requirements and demonstration of interceptor algorithm operating on this real-time platform are provided.
Introducing a distributed unstructured mesh into gyrokinetic particle-in-cell code, XGC
NASA Astrophysics Data System (ADS)
Yoon, Eisung; Shephard, Mark; Seol, E. Seegyoung; Kalyanaraman, Kaushik
2017-10-01
XGC has shown good scalability for large leadership supercomputers. The current production version uses a copy of the entire unstructured finite element mesh on every MPI rank. Although an obvious scalability issue if the mesh sizes are to be dramatically increased, the current approach is also not optimal with respect to data locality of particles and mesh information. To address these issues we have initiated the development of a distributed mesh PIC method. This approach directly addresses the base scalability issue with respect to mesh size and, through the use of a mesh entity centric view of the particle mesh relationship, provides opportunities to address data locality needs of many core and GPU supported heterogeneous systems. The parallel mesh PIC capabilities are being built on the Parallel Unstructured Mesh Infrastructure (PUMI). The presentation will first overview the form of mesh distribution used and indicate the structures and functions used to support the mesh, the particles and their interaction. Attention will then focus on the node-level optimizations being carried out to ensure performant operation of all PIC operations on the distributed mesh. Partnership for Edge Physics Simulation (EPSI) Grant No. DE-SC0008449 and Center for Extended Magnetohydrodynamic Modeling (CEMM) Grant No. DE-SC0006618.
Parallel-aware, dedicated job co-scheduling within/across symmetric multiprocessing nodes
Jones, Terry R.; Watson, Pythagoras C.; Tuel, William; Brenner, Larry; ,Caffrey, Patrick; Fier, Jeffrey
2010-10-05
In a parallel computing environment comprising a network of SMP nodes each having at least one processor, a parallel-aware co-scheduling method and system for improving the performance and scalability of a dedicated parallel job having synchronizing collective operations. The method and system uses a global co-scheduler and an operating system kernel dispatcher adapted to coordinate interfering system and daemon activities on a node and across nodes to promote intra-node and inter-node overlap of said interfering system and daemon activities as well as intra-node and inter-node overlap of said synchronizing collective operations. In this manner, the impact of random short-lived interruptions, such as timer-decrement processing and periodic daemon activity, on synchronizing collective operations is minimized on large processor-count SPMD bulk-synchronous programming styles.
Mobile and replicated alignment of arrays in data-parallel programs
NASA Technical Reports Server (NTRS)
Chatterjee, Siddhartha; Gilbert, John R.; Schreiber, Robert
1993-01-01
When a data-parallel language like FORTRAN 90 is compiled for a distributed-memory machine, aggregate data objects (such as arrays) are distributed across the processor memories. The mapping determines the amount of residual communication needed to bring operands of parallel operations into alignment with each other. A common approach is to break the mapping into two stages: first, an alignment that maps all the objects to an abstract template, and then a distribution that maps the template to the processors. We solve two facets of the problem of finding alignments that reduce residual communication: we determine alignments that vary in loops, and objects that should have replicated alignments. We show that loop-dependent mobile alignment is sometimes necessary for optimum performance, and we provide algorithms with which a compiler can determine good mobile alignments for objects within do loops. We also identify situations in which replicated alignment is either required by the program itself (via spread operations) or can be used to improve performance. We propose an algorithm based on network flow that determines which objects to replicate so as to minimize the total amount of broadcast communication in replication. This work on mobile and replicated alignment extends our earlier work on determining static alignment.
Miao Meng; Kiani, Mehdi
2016-08-01
In order to achieve efficient wireless power transmission (WPT) to biomedical implants with millimeter (mm) dimensions, ultrasonic WPT links have recently been proposed. Operating both transmitter (Tx) and receiver (Rx) ultrasonic transducers at their resonance frequency (fr) is key in improving power transmission efficiency (PTE). In this paper, different resonance configurations for Tx and Rx transducers, including series and parallel resonance, have been studied to help the designers of ultrasonic WPT links to choose the optimal resonance configuration for Tx and Rx that maximizes PTE. The geometries for disk-shaped transducers of four different sets of links, operating at series-series, series-parallel, parallel-series, and parallel-parallel resonance configurations in Tx and Rx, have been found through finite-element method (FEM) simulation tools for operation at fr of 1.4 MHz. Our simulation results suggest that operating the Tx transducer with parallel resonance increases PTE, while the resonance configuration of the mm-sized Rx transducer highly depends on the load resistance, Rl. For applications that involve large Rl in the order of tens of kΩ, a parallel resonance for a mm-sized Rx leads to higher PTE, while series resonance is preferred for Rl in the order of several kΩ and below.
2015-06-01
cient parallel code for applying the operator. Our method constructs a polynomial preconditioner using a nonlinear least squares (NLLS) algorithm. We show...apply the underlying operator. Such a preconditioner can be very attractive in scenarios where one has a highly efficient parallel code for applying...repeatedly solve a large system of linear equations where one has an extremely fast parallel code for applying an underlying fixed linear operator
HALOS: fast, autonomous, holographic adaptive optics
NASA Astrophysics Data System (ADS)
Andersen, Geoff P.; Gelsinger-Austin, Paul; Gaddipati, Ravi; Gaddipati, Phani; Ghebremichael, Fassil
2014-08-01
We present progress on our holographic adaptive laser optics system (HALOS): a compact, closed-loop aberration correction system that uses a multiplexed hologram to deconvolve the phase aberrations in an input beam. The wavefront characterization is based on simple, parallel measurements of the intensity of fixed focal spots and does not require any complex calculations. As such, the system does not require a computer and is thus much cheaper, less complex than conventional approaches. We present details of a fully functional, closed-loop prototype incorporating a 32-element MEMS mirror, operating at a bandwidth of over 10kHz. Additionally, since the all-optical sensing is made in parallel, the speed is independent of actuator number - running at the same bandwidth for one actuator as for a million.
Students' Adoption of Course-Specific Approaches to Learning in Two Parallel Courses
ERIC Educational Resources Information Center
Öhrstedt, Maria; Lindfors, Petra
2016-01-01
Research on students' adoption of course-specific approaches to learning in parallel courses is limited and inconsistent. This study investigated second-semester psychology students' levels of deep, surface and strategic approaches in two courses running in parallel within a real-life university setting. The results showed significant differences…
Lü, Qiang; Xia, Xiao-Yan; Chen, Rong; Miao, Da-Jun; Chen, Sha-Sha; Quan, Li-Jun; Li, Hai-Ou
2012-01-01
Protein structure prediction (PSP), which is usually modeled as a computational optimization problem, remains one of the biggest challenges in computational biology. PSP encounters two difficult obstacles: the inaccurate energy function problem and the searching problem. Even if the lowest energy has been luckily found by the searching procedure, the correct protein structures are not guaranteed to obtain. A general parallel metaheuristic approach is presented to tackle the above two problems. Multi-energy functions are employed to simultaneously guide the parallel searching threads. Searching trajectories are in fact controlled by the parameters of heuristic algorithms. The parallel approach allows the parameters to be perturbed during the searching threads are running in parallel, while each thread is searching the lowest energy value determined by an individual energy function. By hybridizing the intelligences of parallel ant colonies and Monte Carlo Metropolis search, this paper demonstrates an implementation of our parallel approach for PSP. 16 classical instances were tested to show that the parallel approach is competitive for solving PSP problem. This parallel approach combines various sources of both searching intelligences and energy functions, and thus predicts protein conformations with good quality jointly determined by all the parallel searching threads and energy functions. It provides a framework to combine different searching intelligence embedded in heuristic algorithms. It also constructs a container to hybridize different not-so-accurate objective functions which are usually derived from the domain expertise.
Lü, Qiang; Xia, Xiao-Yan; Chen, Rong; Miao, Da-Jun; Chen, Sha-Sha; Quan, Li-Jun; Li, Hai-Ou
2012-01-01
Background Protein structure prediction (PSP), which is usually modeled as a computational optimization problem, remains one of the biggest challenges in computational biology. PSP encounters two difficult obstacles: the inaccurate energy function problem and the searching problem. Even if the lowest energy has been luckily found by the searching procedure, the correct protein structures are not guaranteed to obtain. Results A general parallel metaheuristic approach is presented to tackle the above two problems. Multi-energy functions are employed to simultaneously guide the parallel searching threads. Searching trajectories are in fact controlled by the parameters of heuristic algorithms. The parallel approach allows the parameters to be perturbed during the searching threads are running in parallel, while each thread is searching the lowest energy value determined by an individual energy function. By hybridizing the intelligences of parallel ant colonies and Monte Carlo Metropolis search, this paper demonstrates an implementation of our parallel approach for PSP. 16 classical instances were tested to show that the parallel approach is competitive for solving PSP problem. Conclusions This parallel approach combines various sources of both searching intelligences and energy functions, and thus predicts protein conformations with good quality jointly determined by all the parallel searching threads and energy functions. It provides a framework to combine different searching intelligence embedded in heuristic algorithms. It also constructs a container to hybridize different not-so-accurate objective functions which are usually derived from the domain expertise. PMID:23028708
Mulder, Samuel A; Wunsch, Donald C
2003-01-01
The Traveling Salesman Problem (TSP) is a very hard optimization problem in the field of operations research. It has been shown to be NP-complete, and is an often-used benchmark for new optimization techniques. One of the main challenges with this problem is that standard, non-AI heuristic approaches such as the Lin-Kernighan algorithm (LK) and the chained LK variant are currently very effective and in wide use for the common fully connected, Euclidean variant that is considered here. This paper presents an algorithm that uses adaptive resonance theory (ART) in combination with a variation of the Lin-Kernighan local optimization algorithm to solve very large instances of the TSP. The primary advantage of this algorithm over traditional LK and chained-LK approaches is the increased scalability and parallelism allowed by the divide-and-conquer clustering paradigm. Tours obtained by the algorithm are lower quality, but scaling is much better and there is a high potential for increasing performance using parallel hardware.
Lessons Learned in Deploying the World s Largest Scale Lustre File System
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dillow, David A; Fuller, Douglas; Wang, Feiyi
2010-01-01
The Spider system at the Oak Ridge National Laboratory's Leadership Computing Facility (OLCF) is the world's largest scale Lustre parallel file system. Envisioned as a shared parallel file system capable of delivering both the bandwidth and capacity requirements of the OLCF's diverse computational environment, the project had a number of ambitious goals. To support the workloads of the OLCF's diverse computational platforms, the aggregate performance and storage capacity of Spider exceed that of our previously deployed systems by a factor of 6x - 240 GB/sec, and 17x - 10 Petabytes, respectively. Furthermore, Spider supports over 26,000 clients concurrently accessing themore » file system, which exceeds our previously deployed systems by nearly 4x. In addition to these scalability challenges, moving to a center-wide shared file system required dramatically improved resiliency and fault-tolerance mechanisms. This paper details our efforts in designing, deploying, and operating Spider. Through a phased approach of research and development, prototyping, deployment, and transition to operations, this work has resulted in a number of insights into large-scale parallel file system architectures, from both the design and the operational perspectives. We present in this paper our solutions to issues such as network congestion, performance baselining and evaluation, file system journaling overheads, and high availability in a system with tens of thousands of components. We also discuss areas of continued challenges, such as stressed metadata performance and the need for file system quality of service alongside with our efforts to address them. Finally, operational aspects of managing a system of this scale are discussed along with real-world data and observations.« less
Overview of hybrid fiber-coaxial network deployment in the deregulated UK environment
NASA Astrophysics Data System (ADS)
Cox, Alan L.
1995-11-01
Cable operators in the U.K. enjoy unprecedented license to construct networks and operate cable TV and telecommunications services within their franchise areas. In general, operators have built hybrid-fiber-coax (HFC) networks for cable TV in parallel with fiber-copper-pair networks for telephony. The commonly used network architectures are reviewed, together with their present and future capacities. Despite this dual-technology approach, there is considerable interest in the integration of telephony services onto the HFC network and the development of new interactive services for which HFC may be more suitable than copper pairs. Certain technological and commercial developments may have considerable significance for HFC networks and their operators. These include the digitalization of TV distribution and the rising demand for high-rate digital access lines. Possible scenarios are discussed.
Multidimensional Simulation Applied to Water Resources Management
NASA Astrophysics Data System (ADS)
Camara, A. S.; Ferreira, F. C.; Loucks, D. P.; Seixas, M. J.
1990-09-01
A framework for an integrated decision aiding simulation (IDEAS) methodology using numerical, linguistic, and pictorial entities and operations is introduced. IDEAS relies upon traditional numerical formulations, logical rules to handle linguistic entities with linguistic values, and a set of pictorial operations. Pictorial entities are defined by their shape, size, color, and position. Pictorial operators include reproduction (copy of a pictorial entity), mutation (expansion, rotation, translation, change in color), fertile encounters (intersection, reunion), and sterile encounters (absorption). Interaction between numerical, linguistic, and pictorial entities is handled through logical rules or a simplified vector calculus operation. This approach is shown to be applicable to various environmental and water resources management analyses using a model to assess the impacts of an oil spill. Future developments, including IDEAS implementation on parallel processing machines, are also discussed.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chu, T.L.; Musicki, Z.; Kohut, P.
1994-06-01
During 1989, the Nuclear Regulatory Commission (NRC) initiated an extensive program to carefully examine the Potential risks during low Power and shutdown operations. The program includes two parallel projects being performed by Brookhaven National Laboratory (BNL) and Sandia National Laboratories (SNL). Two plants, Surry (pressurized water reactor) and Grand Gulf (boiling water reactor), were selected as the Plants to be studied. The objectives of the program are to assess the risks of severe accidents initiated during plant operational states other than full power operation and to compare the estimated core damage frequencies, important accident sequences and other qualitative and quantitativemore » results with those accidents initiated during full power operation as assessed in NUREG-1150. The objective of this report is to document the approach utilized in the Surry plant and discuss the results obtained. A parallel report for the Grand Gulf plant is prepared by SNL. This study shows that the core-damage frequency during mid-loop operation at the Surry plant is comparable to that of power operation. We recognize that there is very large uncertainty in the human error probabilities in this study. This study identified that only a few procedures are available for mitigating accidents that may occur during shutdown. Procedures written specifically for shutdown accidents would be useful. This document, Volume 2, Pt. 2 provides appendices A through D of this report.« less
Parallel image logical operations using cross correlation
NASA Technical Reports Server (NTRS)
Strong, J. P., III
1972-01-01
Methods are presented for counting areas in an image in a parallel manner using noncoherent optical techniques. The techniques presented include the Levialdi algorithm for counting, optical techniques for binary operations, and cross-correlation.
One tool - one team: the marriage of test and operations in a low-budget spacecraft development
NASA Astrophysics Data System (ADS)
Finley, Charles J.
2006-05-01
The Air Force Research Laboratory's Space Vehicles Directorate (AFRL/VS) and the Department of Defense Space Test Program (STP) are two organizations that have partnered on more than 85 missions since 1968 to develop, launch, and operate Research and Development, Test and Evaluation space missions. As valuable as these missions have been to the follow-on generation of Operational systems, they are consistently under-funded and forced to execute on excessively ambitious development schedules. Due to these constraints, space mission development teams that serve the RDT&E community are faced with a number of unique technical and programmatic challenges. AFRL and STP have taken various approaches throughout the mission lifecycle to accelerate their development schedules, without sacrificing cost or system reliability. In the areas of test and operations, they currently employ one of two strategies. Historically, they have sought to avoid the added cost and complexity associated with coupled development schedules and segregated the spacecraft development and test effort from the ground operations system development and test effort. However, because these efforts have far more in common than they have differences, they have more recently attempted to pursue parallel I&T and Operations development and readiness efforts. This paper seeks to compare and contrast the "decoupled test and operations" approach, used by such missions as C/NOFS and Coriolis, with the "coupled test and operations" approach, adopted by the XSS-11 and TacSat-2 missions.
Efficient parallel implicit methods for rotary-wing aerodynamics calculations
NASA Astrophysics Data System (ADS)
Wissink, Andrew M.
Euler/Navier-Stokes Computational Fluid Dynamics (CFD) methods are commonly used for prediction of the aerodynamics and aeroacoustics of modern rotary-wing aircraft. However, their widespread application to large complex problems is limited lack of adequate computing power. Parallel processing offers the potential for dramatic increases in computing power, but most conventional implicit solution methods are inefficient in parallel and new techniques must be adopted to realize its potential. This work proposes alternative implicit schemes for Euler/Navier-Stokes rotary-wing calculations which are robust and efficient in parallel. The first part of this work proposes an efficient parallelizable modification of the Lower Upper-Symmetric Gauss Seidel (LU-SGS) implicit operator used in the well-known Transonic Unsteady Rotor Navier Stokes (TURNS) code. The new hybrid LU-SGS scheme couples a point-relaxation approach of the Data Parallel-Lower Upper Relaxation (DP-LUR) algorithm for inter-processor communication with the Symmetric Gauss Seidel algorithm of LU-SGS for on-processor computations. With the modified operator, TURNS is implemented in parallel using Message Passing Interface (MPI) for communication. Numerical performance and parallel efficiency are evaluated on the IBM SP2 and Thinking Machines CM-5 multi-processors for a variety of steady-state and unsteady test cases. The hybrid LU-SGS scheme maintains the numerical performance of the original LU-SGS algorithm in all cases and shows a good degree of parallel efficiency. It experiences a higher degree of robustness than DP-LUR for third-order upwind solutions. The second part of this work examines use of Krylov subspace iterative solvers for the nonlinear CFD solutions. The hybrid LU-SGS scheme is used as a parallelizable preconditioner. Two iterative methods are tested, Generalized Minimum Residual (GMRES) and Orthogonal s-Step Generalized Conjugate Residual (OSGCR). The Newton method demonstrates good parallel performance on the IBM SP2, with OS-GCR giving slightly better performance than GMRES on large numbers of processors. For steady and quasi-steady calculations, the convergence rate is accelerated but the overall solution time remains about the same as the standard hybrid LU-SGS scheme. For unsteady calculations, however, the Newton method maintains a higher degree of time-accuracy which allows tbe use of larger timesteps and results in CPU savings of 20-35%.
Large-Scale Parallel Viscous Flow Computations using an Unstructured Multigrid Algorithm
NASA Technical Reports Server (NTRS)
Mavriplis, Dimitri J.
1999-01-01
The development and testing of a parallel unstructured agglomeration multigrid algorithm for steady-state aerodynamic flows is discussed. The agglomeration multigrid strategy uses a graph algorithm to construct the coarse multigrid levels from the given fine grid, similar to an algebraic multigrid approach, but operates directly on the non-linear system using the FAS (Full Approximation Scheme) approach. The scalability and convergence rate of the multigrid algorithm are examined on the SGI Origin 2000 and the Cray T3E. An argument is given which indicates that the asymptotic scalability of the multigrid algorithm should be similar to that of its underlying single grid smoothing scheme. For medium size problems involving several million grid points, near perfect scalability is obtained for the single grid algorithm, while only a slight drop-off in parallel efficiency is observed for the multigrid V- and W-cycles, using up to 128 processors on the SGI Origin 2000, and up to 512 processors on the Cray T3E. For a large problem using 25 million grid points, good scalability is observed for the multigrid algorithm using up to 1450 processors on a Cray T3E, even when the coarsest grid level contains fewer points than the total number of processors.
Novel approach for image skeleton and distance transformation parallel algorithms
NASA Astrophysics Data System (ADS)
Qing, Kent P.; Means, Robert W.
1994-05-01
Image Understanding is more important in medical imaging than ever, particularly where real-time automatic inspection, screening and classification systems are installed. Skeleton and distance transformations are among the common operations that extract useful information from binary images and aid in Image Understanding. The distance transformation describes the objects in an image by labeling every pixel in each object with the distance to its nearest boundary. The skeleton algorithm starts from the distance transformation and finds the set of pixels that have a locally maximum label. The distance algorithm has to scan the entire image several times depending on the object width. For each pixel, the algorithm must access the neighboring pixels and find the maximum distance from the nearest boundary. It is a computational and memory access intensive procedure. In this paper, we propose a novel parallel approach to the distance transform and skeleton algorithms using the latest VLSI high- speed convolutional chips such as HNC's ViP. The algorithm speed is dependent on the object's width and takes (k + [(k-1)/3]) * 7 milliseconds for a 512 X 512 image with k being the maximum distance of the largest object. All objects in the image will be skeletonized at the same time in parallel.
Scalable parallel communications
NASA Technical Reports Server (NTRS)
Maly, K.; Khanna, S.; Overstreet, C. M.; Mukkamala, R.; Zubair, M.; Sekhar, Y. S.; Foudriat, E. C.
1992-01-01
Coarse-grain parallelism in networking (that is, the use of multiple protocol processors running replicated software sending over several physical channels) can be used to provide gigabit communications for a single application. Since parallel network performance is highly dependent on real issues such as hardware properties (e.g., memory speeds and cache hit rates), operating system overhead (e.g., interrupt handling), and protocol performance (e.g., effect of timeouts), we have performed detailed simulations studies of both a bus-based multiprocessor workstation node (based on the Sun Galaxy MP multiprocessor) and a distributed-memory parallel computer node (based on the Touchstone DELTA) to evaluate the behavior of coarse-grain parallelism. Our results indicate: (1) coarse-grain parallelism can deliver multiple 100 Mbps with currently available hardware platforms and existing networking protocols (such as Transmission Control Protocol/Internet Protocol (TCP/IP) and parallel Fiber Distributed Data Interface (FDDI) rings); (2) scale-up is near linear in n, the number of protocol processors, and channels (for small n and up to a few hundred Mbps); and (3) since these results are based on existing hardware without specialized devices (except perhaps for some simple modifications of the FDDI boards), this is a low cost solution to providing multiple 100 Mbps on current machines. In addition, from both the performance analysis and the properties of these architectures, we conclude: (1) multiple processors providing identical services and the use of space division multiplexing for the physical channels can provide better reliability than monolithic approaches (it also provides graceful degradation and low-cost load balancing); (2) coarse-grain parallelism supports running several transport protocols in parallel to provide different types of service (for example, one TCP handles small messages for many users, other TCP's running in parallel provide high bandwidth service to a single application); and (3) coarse grain parallelism will be able to incorporate many future improvements from related work (e.g., reduced data movement, fast TCP, fine-grain parallelism) also with near linear speed-ups.
Laplante, Karine; Sébastien, Boutin; Derome, Nicolas
2013-01-01
Heavy metals released by anthropogenic activities such as mining trigger profound changes to bacterial communities. In this study we used 16S SSU rRNA gene high-throughput sequencing to characterize the impact of a polymetallic perturbation and other environmental parameters on taxonomic networks within five lacustrine bacterial communities from sites located near Rouyn-Noranda, Quebec, Canada. The results showed that community equilibrium was disturbed in terms of both diversity and structure. Moreover, heavy metals, especially cadmium combined with water acidity, induced parallel changes among sites via the selection of resistant OTUs (Operational Taxonomic Unit) and taxonomic dominance perturbations favoring the Alphaproteobacteria. Furthermore, under a similar selective pressure, covariation trends between phyla revealed conservation and parallelism within interphylum interactions. Our study sheds light on the importance of analyzing communities not only from a phylogenetic perspective but also including a quantitative approach to provide significant insights into the evolutionary forces that shape the dynamic of the taxonomic interaction networks in bacterial communities. PMID:23789031
Brian hears: online auditory processing using vectorization over channels.
Fontaine, Bertrand; Goodman, Dan F M; Benichoux, Victor; Brette, Romain
2011-01-01
The human cochlea includes about 3000 inner hair cells which filter sounds at frequencies between 20 Hz and 20 kHz. This massively parallel frequency analysis is reflected in models of auditory processing, which are often based on banks of filters. However, existing implementations do not exploit this parallelism. Here we propose algorithms to simulate these models by vectorizing computation over frequency channels, which are implemented in "Brian Hears," a library for the spiking neural network simulator package "Brian." This approach allows us to use high-level programming languages such as Python, because with vectorized operations, the computational cost of interpretation represents a small fraction of the total cost. This makes it possible to define and simulate complex models in a simple way, while all previous implementations were model-specific. In addition, we show that these algorithms can be naturally parallelized using graphics processing units, yielding substantial speed improvements. We demonstrate these algorithms with several state-of-the-art cochlear models, and show that they compare favorably with existing, less flexible, implementations.
Concurrency-based approaches to parallel programming
NASA Technical Reports Server (NTRS)
Kale, L.V.; Chrisochoides, N.; Kohl, J.; Yelick, K.
1995-01-01
The inevitable transition to parallel programming can be facilitated by appropriate tools, including languages and libraries. After describing the needs of applications developers, this paper presents three specific approaches aimed at development of efficient and reusable parallel software for irregular and dynamic-structured problems. A salient feature of all three approaches in their exploitation of concurrency within a processor. Benefits of individual approaches such as these can be leveraged by an interoperability environment which permits modules written using different approaches to co-exist in single applications.
Parallel processing considerations for image recognition tasks
NASA Astrophysics Data System (ADS)
Simske, Steven J.
2011-01-01
Many image recognition tasks are well-suited to parallel processing. The most obvious example is that many imaging tasks require the analysis of multiple images. From this standpoint, then, parallel processing need be no more complicated than assigning individual images to individual processors. However, there are three less trivial categories of parallel processing that will be considered in this paper: parallel processing (1) by task; (2) by image region; and (3) by meta-algorithm. Parallel processing by task allows the assignment of multiple workflows-as diverse as optical character recognition [OCR], document classification and barcode reading-to parallel pipelines. This can substantially decrease time to completion for the document tasks. For this approach, each parallel pipeline is generally performing a different task. Parallel processing by image region allows a larger imaging task to be sub-divided into a set of parallel pipelines, each performing the same task but on a different data set. This type of image analysis is readily addressed by a map-reduce approach. Examples include document skew detection and multiple face detection and tracking. Finally, parallel processing by meta-algorithm allows different algorithms to be deployed on the same image simultaneously. This approach may result in improved accuracy.
NASA Astrophysics Data System (ADS)
Rizki, Permata Nur Miftahur; Lee, Heezin; Lee, Minsu; Oh, Sangyoon
2017-01-01
With the rapid advance of remote sensing technology, the amount of three-dimensional point-cloud data has increased extraordinarily, requiring faster processing in the construction of digital elevation models. There have been several attempts to accelerate the computation using parallel methods; however, little attention has been given to investigating different approaches for selecting the most suited parallel programming model for a given computing environment. We present our findings and insights identified by implementing three popular high-performance parallel approaches (message passing interface, MapReduce, and GPGPU) on time demanding but accurate kriging interpolation. The performances of the approaches are compared by varying the size of the grid and input data. In our empirical experiment, we demonstrate the significant acceleration by all three approaches compared to a C-implemented sequential-processing method. In addition, we also discuss the pros and cons of each method in terms of usability, complexity infrastructure, and platform limitation to give readers a better understanding of utilizing those parallel approaches for gridding purposes.
GASPACHO: a generic automatic solver using proximal algorithms for convex huge optimization problems
NASA Astrophysics Data System (ADS)
Goossens, Bart; Luong, Hiêp; Philips, Wilfried
2017-08-01
Many inverse problems (e.g., demosaicking, deblurring, denoising, image fusion, HDR synthesis) share various similarities: degradation operators are often modeled by a specific data fitting function while image prior knowledge (e.g., sparsity) is incorporated by additional regularization terms. In this paper, we investigate automatic algorithmic techniques for evaluating proximal operators. These algorithmic techniques also enable efficient calculation of adjoints from linear operators in a general matrix-free setting. In particular, we study the simultaneous-direction method of multipliers (SDMM) and the parallel proximal algorithm (PPXA) solvers and show that the automatically derived implementations are well suited for both single-GPU and multi-GPU processing. We demonstrate this approach for an Electron Microscopy (EM) deconvolution problem.
Effecting a broadcast with an allreduce operation on a parallel computer
Almasi, Gheorghe; Archer, Charles J.; Ratterman, Joseph D.; Smith, Brian E.
2010-11-02
A parallel computer comprises a plurality of compute nodes organized into at least one operational group for collective parallel operations. Each compute node is assigned a unique rank and is coupled for data communications through a global combining network. One compute node is assigned to be a logical root. A send buffer and a receive buffer is configured. Each element of a contribution of the logical root in the send buffer is contributed. One or more zeros corresponding to a size of the element are injected. An allreduce operation with a bitwise OR using the element and the injected zeros is performed. And the result for the allreduce operation is determined and stored in each receive buffer.
Fast parallel approach for 2-D DHT-based real-valued discrete Gabor transform.
Tao, Liang; Kwan, Hon Keung
2009-12-01
Two-dimensional fast Gabor transform algorithms are useful for real-time applications due to the high computational complexity of the traditional 2-D complex-valued discrete Gabor transform (CDGT). This paper presents two block time-recursive algorithms for 2-D DHT-based real-valued discrete Gabor transform (RDGT) and its inverse transform and develops a fast parallel approach for the implementation of the two algorithms. The computational complexity of the proposed parallel approach is analyzed and compared with that of the existing 2-D CDGT algorithms. The results indicate that the proposed parallel approach is attractive for real time image processing.
Kistler reusable vehicle facility design and operational approach
NASA Astrophysics Data System (ADS)
Fagan, D.; McInerney, F.; Johnston, C.; Tolson, B.
Kistler Aerospace Corporation is designing and developing the K-1, the world's first fully reusable aerospace vehicle to deliver satellites into orbit. The K-1 vehicle test program will be conducted in Woomera, Australia, with commercial operations scheduled to begin shortly afterwards. Both stages of the K-1 will return to the launch site utilizing parachutes and airbags for a soft landing within 24 h after launch. The turnaround flow of the two stages will cycle from landing site to a maintenance/refurbishment facility and through the next launch in only 9 days. Payload processing will occur in a separate facility in parallel with recovery and refurbishment operations. The vehicle design and on-board checkout capability of the avionics system eliminates the need for an abundance of ground checkout equipment. Payload integration, vehicle assembly, and K-1 transport to the launch pad will be performed horizontally, simplifying processing and reducing infrastructure requirements. This simple, innovative, and cost-effective approach will allow Kistler to offer its customers flexible, low-cost, and on-demand launch services.
Serial DNA relay in DNA logic gates by electrical fusion and mechanical splitting of droplets
Kawano, Ryuji; Takinoue, Masahiro; Osaki, Toshihisa; Kamiya, Koki; Miki, Norihisa
2017-01-01
DNA logic circuits utilizing DNA hybridization and/or enzymatic reactions have drawn increasing attention for their potential applications in the diagnosis and treatment of cellular diseases. The compartmentalization of such a system into a microdroplet considerably helps to precisely regulate local interactions and reactions between molecules. In this study, we introduced a relay approach for enabling the transfer of DNA from one droplet to another to implement multi-step sequential logic operations. We proposed electrical fusion and mechanical splitting of droplets to facilitate the DNA flow at the inputs, logic operation, output, and serial connection between two logic gates. We developed Negative-OR operations integrated by a serial connection of the OR gate and NOT gate incorporated in a series of droplets. The four types of input defined by the presence/absence of DNA in the input droplet pair were correctly reflected in the readout at the Negative-OR gate. The proposed approach potentially allows for serial and parallel logic operations that could be used for complex diagnostic applications. PMID:28700641
Preliminary Human-in-the-Loop Assessment of Procedures for Very-Closely-Spaced Parallel Runways
NASA Technical Reports Server (NTRS)
Verma, Savita; Lozito, Sandra C.; Ballinger, Deborah S.; Trot, Greg; Hardy, Gordon H.; Panda, Ramesh C.; Lehmer, Ronald D.; Kozon, Thomas E.
2010-01-01
Demand in the future air transportation system concept is expected to double or triple by 2025 [1]. Increasing airport arrival rates will help meet the growing demand that could be met with additional runways but the expansion airports is met with environmental challenges for the surrounding communities when using current standards and procedures. Therefore, changes to airport operations can improve airport capacity without adding runways. Building additional runways between current ones, or moving them closer, is a potential solution to meeting the increasing demand, as addressed by the Terminal Area Capacity Enhancing Concept (TACEC). TACEC requires robust technologies and procedures that need to be tested such that operations are not compromised under instrument meteorological conditions. The reduction of runway spacing for independent simultaneous operations dramatically exacerbates the criticality of wake vortex incursion and the calculation of a safe and proper breakout maneuver. The study presented here developed guidelines for such operations by performing a real-time, human-in-the-loop simulation using precision navigation, autopilot-flown approaches, with the pilot monitoring aircraft spacing and the wake vortex safe zone during the approach.
Parallel Algorithms for Switching Edges in Heterogeneous Graphs.
Bhuiyan, Hasanuzzaman; Khan, Maleq; Chen, Jiangzhuo; Marathe, Madhav
2017-06-01
An edge switch is an operation on a graph (or network) where two edges are selected randomly and one of their end vertices are swapped with each other. Edge switch operations have important applications in graph theory and network analysis, such as in generating random networks with a given degree sequence, modeling and analyzing dynamic networks, and in studying various dynamic phenomena over a network. The recent growth of real-world networks motivates the need for efficient parallel algorithms. The dependencies among successive edge switch operations and the requirement to keep the graph simple (i.e., no self-loops or parallel edges) as the edges are switched lead to significant challenges in designing a parallel algorithm. Addressing these challenges requires complex synchronization and communication among the processors leading to difficulties in achieving a good speedup by parallelization. In this paper, we present distributed memory parallel algorithms for switching edges in massive networks. These algorithms provide good speedup and scale well to a large number of processors. A harmonic mean speedup of 73.25 is achieved on eight different networks with 1024 processors. One of the steps in our edge switch algorithms requires the computation of multinomial random variables in parallel. This paper presents the first non-trivial parallel algorithm for the problem, achieving a speedup of 925 using 1024 processors.
Parallel Algorithms for Switching Edges in Heterogeneous Graphs☆
Khan, Maleq; Chen, Jiangzhuo; Marathe, Madhav
2017-01-01
An edge switch is an operation on a graph (or network) where two edges are selected randomly and one of their end vertices are swapped with each other. Edge switch operations have important applications in graph theory and network analysis, such as in generating random networks with a given degree sequence, modeling and analyzing dynamic networks, and in studying various dynamic phenomena over a network. The recent growth of real-world networks motivates the need for efficient parallel algorithms. The dependencies among successive edge switch operations and the requirement to keep the graph simple (i.e., no self-loops or parallel edges) as the edges are switched lead to significant challenges in designing a parallel algorithm. Addressing these challenges requires complex synchronization and communication among the processors leading to difficulties in achieving a good speedup by parallelization. In this paper, we present distributed memory parallel algorithms for switching edges in massive networks. These algorithms provide good speedup and scale well to a large number of processors. A harmonic mean speedup of 73.25 is achieved on eight different networks with 1024 processors. One of the steps in our edge switch algorithms requires the computation of multinomial random variables in parallel. This paper presents the first non-trivial parallel algorithm for the problem, achieving a speedup of 925 using 1024 processors. PMID:28757680
Implementation of a Parallel Kalman Filter for Stratospheric Chemical Tracer Assimilation
NASA Technical Reports Server (NTRS)
Chang, Lang-Ping; Lyster, Peter M.; Menard, R.; Cohn, S. E.
1998-01-01
A Kalman filter for the assimilation of long-lived atmospheric chemical constituents has been developed for two-dimensional transport models on isentropic surfaces over the globe. An important attribute of the Kalman filter is that it calculates error covariances of the constituent fields using the tracer dynamics. Consequently, the current Kalman-filter assimilation is a five-dimensional problem (coordinates of two points and time), and it can only be handled on computers with large memory and high floating point speed. In this paper, an implementation of the Kalman filter for distributed-memory, message-passing parallel computers is discussed. Two approaches were studied: an operator decomposition and a covariance decomposition. The latter was found to be more scalable than the former, and it possesses the property that the dynamical model does not need to be parallelized, which is of considerable practical advantage. This code is currently used to assimilate constituent data retrieved by limb sounders on the Upper Atmosphere Research Satellite. Tests of the code examined the variance transport and observability properties. Aspects of the parallel implementation, some timing results, and a brief discussion of the physical results will be presented.
An Asynchronous Many-Task Implementation of In-Situ Statistical Analysis using Legion.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pebay, Philippe Pierre; Bennett, Janine Camille
2015-11-01
In this report, we propose a framework for the design and implementation of in-situ analy- ses using an asynchronous many-task (AMT) model, using the Legion programming model together with the MiniAero mini-application as a surrogate for full-scale parallel scientific computing applications. The bulk of this work consists of converting the Learn/Derive/Assess model which we had initially developed for parallel statistical analysis using MPI [PTBM11], from a SPMD to an AMT model. In this goal, we propose an original use of the concept of Legion logical regions as a replacement for the parallel communication schemes used for the only operation ofmore » the statistics engines that require explicit communication. We then evaluate this proposed scheme in a shared memory environment, using the Legion port of MiniAero as a proxy for a full-scale scientific application, as a means to provide input data sets of variable size for the in-situ statistical analyses in an AMT context. We demonstrate in particular that the approach has merit, and warrants further investigation, in collaboration with ongoing efforts to improve the overall parallel performance of the Legion system.« less
Parallel database search and prime factorization with magnonic holographic memory devices
DOE Office of Scientific and Technical Information (OSTI.GOV)
Khitun, Alexander
In this work, we describe the capabilities of Magnonic Holographic Memory (MHM) for parallel database search and prime factorization. MHM is a type of holographic device, which utilizes spin waves for data transfer and processing. Its operation is based on the correlation between the phases and the amplitudes of the input spin waves and the output inductive voltage. The input of MHM is provided by the phased array of spin wave generating elements allowing the producing of phase patterns of an arbitrary form. The latter makes it possible to code logic states into the phases of propagating waves and exploitmore » wave superposition for parallel data processing. We present the results of numerical modeling illustrating parallel database search and prime factorization. The results of numerical simulations on the database search are in agreement with the available experimental data. The use of classical wave interference may results in a significant speedup over the conventional digital logic circuits in special task data processing (e.g., √n in database search). Potentially, magnonic holographic devices can be implemented as complementary logic units to digital processors. Physical limitations and technological constrains of the spin wave approach are also discussed.« less
Parallel database search and prime factorization with magnonic holographic memory devices
NASA Astrophysics Data System (ADS)
Khitun, Alexander
2015-12-01
In this work, we describe the capabilities of Magnonic Holographic Memory (MHM) for parallel database search and prime factorization. MHM is a type of holographic device, which utilizes spin waves for data transfer and processing. Its operation is based on the correlation between the phases and the amplitudes of the input spin waves and the output inductive voltage. The input of MHM is provided by the phased array of spin wave generating elements allowing the producing of phase patterns of an arbitrary form. The latter makes it possible to code logic states into the phases of propagating waves and exploit wave superposition for parallel data processing. We present the results of numerical modeling illustrating parallel database search and prime factorization. The results of numerical simulations on the database search are in agreement with the available experimental data. The use of classical wave interference may results in a significant speedup over the conventional digital logic circuits in special task data processing (e.g., √n in database search). Potentially, magnonic holographic devices can be implemented as complementary logic units to digital processors. Physical limitations and technological constrains of the spin wave approach are also discussed.
Automation of multi-agent control for complex dynamic systems in heterogeneous computational network
NASA Astrophysics Data System (ADS)
Oparin, Gennady; Feoktistov, Alexander; Bogdanova, Vera; Sidorov, Ivan
2017-01-01
The rapid progress of high-performance computing entails new challenges related to solving large scientific problems for various subject domains in a heterogeneous distributed computing environment (e.g., a network, Grid system, or Cloud infrastructure). The specialists in the field of parallel and distributed computing give the special attention to a scalability of applications for problem solving. An effective management of the scalable application in the heterogeneous distributed computing environment is still a non-trivial issue. Control systems that operate in networks, especially relate to this issue. We propose a new approach to the multi-agent management for the scalable applications in the heterogeneous computational network. The fundamentals of our approach are the integrated use of conceptual programming, simulation modeling, network monitoring, multi-agent management, and service-oriented programming. We developed a special framework for an automation of the problem solving. Advantages of the proposed approach are demonstrated on the parametric synthesis example of the static linear regulator for complex dynamic systems. Benefits of the scalable application for solving this problem include automation of the multi-agent control for the systems in a parallel mode with various degrees of its detailed elaboration.
Zhang, Xuejun; Lei, Jiaxing
2015-01-01
Considering reducing the airspace congestion and the flight delay simultaneously, this paper formulates the airway network flow assignment (ANFA) problem as a multiobjective optimization model and presents a new multiobjective optimization framework to solve it. Firstly, an effective multi-island parallel evolution algorithm with multiple evolution populations is employed to improve the optimization capability. Secondly, the nondominated sorting genetic algorithm II is applied for each population. In addition, a cooperative coevolution algorithm is adapted to divide the ANFA problem into several low-dimensional biobjective optimization problems which are easier to deal with. Finally, in order to maintain the diversity of solutions and to avoid prematurity, a dynamic adjustment operator based on solution congestion degree is specifically designed for the ANFA problem. Simulation results using the real traffic data from China air route network and daily flight plans demonstrate that the proposed approach can improve the solution quality effectively, showing superiority to the existing approaches such as the multiobjective genetic algorithm, the well-known multiobjective evolutionary algorithm based on decomposition, and a cooperative coevolution multiobjective algorithm as well as other parallel evolution algorithms with different migration topology. PMID:26180840
National Centers for Environmental Prediction
Operational Forecast Graphics Experimental Forecast Graphics Verification and Diagnostics Model Configuration /EXPERIMENTAL MODEL FORECAST GRAPHICS OPERATIONAL VERIFICATION / DIAGNOSTICS PARALLEL VERIFICATION / DIAGNOSTICS Developmental Air Quality Forecasts and Verification Back to Table of Contents 2. PARALLEL/EXPERIMENTAL GRAPHICS
Performing an allreduce operation on a plurality of compute nodes of a parallel computer
Faraj, Ahmad [Rochester, MN
2012-04-17
Methods, apparatus, and products are disclosed for performing an allreduce operation on a plurality of compute nodes of a parallel computer. Each compute node includes at least two processing cores. Each processing core has contribution data for the allreduce operation. Performing an allreduce operation on a plurality of compute nodes of a parallel computer includes: establishing one or more logical rings among the compute nodes, each logical ring including at least one processing core from each compute node; performing, for each logical ring, a global allreduce operation using the contribution data for the processing cores included in that logical ring, yielding a global allreduce result for each processing core included in that logical ring; and performing, for each compute node, a local allreduce operation using the global allreduce results for each processing core on that compute node.
Reliability Modeling Methodology for Independent Approaches on Parallel Runways Safety Analysis
NASA Technical Reports Server (NTRS)
Babcock, P.; Schor, A.; Rosch, G.
1998-01-01
This document is an adjunct to the final report An Integrated Safety Analysis Methodology for Emerging Air Transport Technologies. That report presents the results of our analysis of the problem of simultaneous but independent, approaches of two aircraft on parallel runways (independent approaches on parallel runways, or IAPR). This introductory chapter presents a brief overview and perspective of approaches and methodologies for performing safety analyses for complex systems. Ensuing chapter provide the technical details that underlie the approach that we have taken in performing the safety analysis for the IAPR concept.
Linearly exact parallel closures for slab geometry
NASA Astrophysics Data System (ADS)
Ji, Jeong-Young; Held, Eric D.; Jhang, Hogun
2013-08-01
Parallel closures are obtained by solving a linearized kinetic equation with a model collision operator using the Fourier transform method. The closures expressed in wave number space are exact for time-dependent linear problems to within the limits of the model collision operator. In the adiabatic, collisionless limit, an inverse Fourier transform is performed to obtain integral (nonlocal) parallel closures in real space; parallel heat flow and viscosity closures for density, temperature, and flow velocity equations replace Braginskii's parallel closure relations, and parallel flow velocity and heat flow closures for density and temperature equations replace Spitzer's parallel transport relations. It is verified that the closures reproduce the exact linear response function of Hammett and Perkins [Phys. Rev. Lett. 64, 3019 (1990)] for Landau damping given a temperature gradient. In contrast to their approximate closures where the vanishing viscosity coefficient numerically gives an exact response, our closures relate the heat flow and nonvanishing viscosity to temperature and flow velocity (gradients).
NASA Technical Reports Server (NTRS)
Springer, P.
1993-01-01
This paper discusses the method in which the Cascade-Correlation algorithm was parallelized in such a way that it could be run using the Time Warp Operating System (TWOS). TWOS is a special purpose operating system designed to run parellel discrete event simulations with maximum efficiency on parallel or distributed computers.
Making almost commuting matrices commute
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hastings, Matthew B
Suppose two Hermitian matrices A, B almost commute ({parallel}[A,B]{parallel} {<=} {delta}). Are they close to a commuting pair of Hermitian matrices, A', B', with {parallel}A-A'{parallel},{parallel}B-B'{parallel} {<=} {epsilon}? A theorem of H. Lin shows that this is uniformly true, in that for every {epsilon} > 0 there exists a {delta} > 0, independent of the size N of the matrices, for which almost commuting implies being close to a commuting pair. However, this theorem does not specifiy how {delta} depends on {epsilon}. We give uniform bounds relating {delta} and {epsilon}. The proof is constructive, giving an explicit algorithm to construct A'more » and B'. We provide tighter bounds in the case of block tridiagonal and tridiagnonal matrices. Within the context of quantum measurement, this implies an algorithm to construct a basis in which we can make a projective measurement that approximately measures two approximately commuting operators simultaneously. Finally, we comment briefly on the case of approximately measuring three or more approximately commuting operators using POVMs (positive operator-valued measures) instead of projective measurements.« less
NASA Astrophysics Data System (ADS)
Ghosh, B.; Hazra, S.; Haldar, N.; Roy, D.; Patra, S. N.; Swarnakar, J.; Sarkar, P. P.; Mukhopadhyay, S.
2018-03-01
Since last few decades optics has already proved its strong potentiality for conducting parallel logic, arithmetic and algebraic operations due to its super-fast speed in communication and computation. So many different logical and sequential operations using all optical frequency encoding technique have been proposed by several authors. Here, we have keened out all optical dibit representation technique, which has the advantages of high speed operation as well as reducing the bit error problem. Exploiting this phenomenon, we have proposed all optical frequency encoded dibit based XOR and XNOR logic gates using the optical switches like add/drop multiplexer (ADM) and reflected semiconductor optical amplifier (RSOA). Also the operations of these gates have been verified through proper simulation using MATLAB (R2008a).
Quaglio, Pietro; Yegenoglu, Alper; Torre, Emiliano; Endres, Dominik M; Grün, Sonja
2017-01-01
Repeated, precise sequences of spikes are largely considered a signature of activation of cell assemblies. These repeated sequences are commonly known under the name of spatio-temporal patterns (STPs). STPs are hypothesized to play a role in the communication of information in the computational process operated by the cerebral cortex. A variety of statistical methods for the detection of STPs have been developed and applied to electrophysiological recordings, but such methods scale poorly with the current size of available parallel spike train recordings (more than 100 neurons). In this work, we introduce a novel method capable of overcoming the computational and statistical limits of existing analysis techniques in detecting repeating STPs within massively parallel spike trains (MPST). We employ advanced data mining techniques to efficiently extract repeating sequences of spikes from the data. Then, we introduce and compare two alternative approaches to distinguish statistically significant patterns from chance sequences. The first approach uses a measure known as conceptual stability, of which we investigate a computationally cheap approximation for applications to such large data sets. The second approach is based on the evaluation of pattern statistical significance. In particular, we provide an extension to STPs of a method we recently introduced for the evaluation of statistical significance of synchronous spike patterns. The performance of the two approaches is evaluated in terms of computational load and statistical power on a variety of artificial data sets that replicate specific features of experimental data. Both methods provide an effective and robust procedure for detection of STPs in MPST data. The method based on significance evaluation shows the best overall performance, although at a higher computational cost. We name the novel procedure the spatio-temporal Spike PAttern Detection and Evaluation (SPADE) analysis.
Quaglio, Pietro; Yegenoglu, Alper; Torre, Emiliano; Endres, Dominik M.; Grün, Sonja
2017-01-01
Repeated, precise sequences of spikes are largely considered a signature of activation of cell assemblies. These repeated sequences are commonly known under the name of spatio-temporal patterns (STPs). STPs are hypothesized to play a role in the communication of information in the computational process operated by the cerebral cortex. A variety of statistical methods for the detection of STPs have been developed and applied to electrophysiological recordings, but such methods scale poorly with the current size of available parallel spike train recordings (more than 100 neurons). In this work, we introduce a novel method capable of overcoming the computational and statistical limits of existing analysis techniques in detecting repeating STPs within massively parallel spike trains (MPST). We employ advanced data mining techniques to efficiently extract repeating sequences of spikes from the data. Then, we introduce and compare two alternative approaches to distinguish statistically significant patterns from chance sequences. The first approach uses a measure known as conceptual stability, of which we investigate a computationally cheap approximation for applications to such large data sets. The second approach is based on the evaluation of pattern statistical significance. In particular, we provide an extension to STPs of a method we recently introduced for the evaluation of statistical significance of synchronous spike patterns. The performance of the two approaches is evaluated in terms of computational load and statistical power on a variety of artificial data sets that replicate specific features of experimental data. Both methods provide an effective and robust procedure for detection of STPs in MPST data. The method based on significance evaluation shows the best overall performance, although at a higher computational cost. We name the novel procedure the spatio-temporal Spike PAttern Detection and Evaluation (SPADE) analysis. PMID:28596729
Scientific Data Services -- A High-Performance I/O System with Array Semantics
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wu, Kesheng; Byna, Surendra; Rotem, Doron
2011-09-21
As high-performance computing approaches exascale, the existing I/O system design is having trouble keeping pace in both performance and scalability. We propose to address this challenge by adopting database principles and techniques in parallel I/O systems. First, we propose to adopt an array data model because many scientific applications represent their data in arrays. This strategy follows a cardinal principle from database research, which separates the logical view from the physical layout of data. This high-level data model gives the underlying implementation more freedom to optimize the physical layout and to choose the most effective way of accessing the data.more » For example, knowing that a set of write operations is working on a single multi-dimensional array makes it possible to keep the subarrays in a log structure during the write operations and reassemble them later into another physical layout as resources permit. While maintaining the high-level view, the storage system could compress the user data to reduce the physical storage requirement, collocate data records that are frequently used together, or replicate data to increase availability and fault-tolerance. Additionally, the system could generate secondary data structures such as database indexes and summary statistics. We expect the proposed Scientific Data Services approach to create a “live” storage system that dynamically adjusts to user demands and evolves with the massively parallel storage hardware.« less
Tuning iteration space slicing based tiled multi-core code implementing Nussinov's RNA folding.
Palkowski, Marek; Bielecki, Wlodzimierz
2018-01-15
RNA folding is an ongoing compute-intensive task of bioinformatics. Parallelization and improving code locality for this kind of algorithms is one of the most relevant areas in computational biology. Fortunately, RNA secondary structure approaches, such as Nussinov's recurrence, involve mathematical operations over affine control loops whose iteration space can be represented by the polyhedral model. This allows us to apply powerful polyhedral compilation techniques based on the transitive closure of dependence graphs to generate parallel tiled code implementing Nussinov's RNA folding. Such techniques are within the iteration space slicing framework - the transitive dependences are applied to the statement instances of interest to produce valid tiles. The main problem at generating parallel tiled code is defining a proper tile size and tile dimension which impact parallelism degree and code locality. To choose the best tile size and tile dimension, we first construct parallel parametric tiled code (parameters are variables defining tile size). With this purpose, we first generate two nonparametric tiled codes with different fixed tile sizes but with the same code structure and then derive a general affine model, which describes all integer factors available in expressions of those codes. Using this model and known integer factors present in the mentioned expressions (they define the left-hand side of the model), we find unknown integers in this model for each integer factor available in the same fixed tiled code position and replace in this code expressions, including integer factors, with those including parameters. Then we use this parallel parametric tiled code to implement the well-known tile size selection (TSS) technique, which allows us to discover in a given search space the best tile size and tile dimension maximizing target code performance. For a given search space, the presented approach allows us to choose the best tile size and tile dimension in parallel tiled code implementing Nussinov's RNA folding. Experimental results, received on modern Intel multi-core processors, demonstrate that this code outperforms known closely related implementations when the length of RNA strands is bigger than 2500.
THE EFFECT OF TWO-MAGNON SCATTERING ON PARALLEL-PUMP INSTABILITY THRESHOLDS.
Following a general description of the important properties and symmetries of the parallel-pump coupling and of two- magnon scattering, several...theoretical approaches to the problem of the effect of two- magnon scattering on the parallel-pump instability threshold are explored. A successful approach
Studies in optical parallel processing. [All optical and electro-optic approaches
NASA Technical Reports Server (NTRS)
Lee, S. H.
1978-01-01
Threshold and A/D devices for converting a gray scale image into a binary one were investigated for all-optical and opto-electronic approaches to parallel processing. Integrated optical logic circuits (IOC) and optical parallel logic devices (OPA) were studied as an approach to processing optical binary signals. In the IOC logic scheme, a single row of an optical image is coupled into the IOC substrate at a time through an array of optical fibers. Parallel processing is carried out out, on each image element of these rows, in the IOC substrate and the resulting output exits via a second array of optical fibers. The OPAL system for parallel processing which uses a Fabry-Perot interferometer for image thresholding and analog-to-digital conversion, achieves a higher degree of parallel processing than is possible with IOC.
Emergence of dynamic cooperativity in the stochastic kinetics of fluctuating enzymes
NASA Astrophysics Data System (ADS)
Kumar, Ashutosh; Chatterjee, Sambarta; Nandi, Mintu; Dua, Arti
2016-08-01
Dynamic co-operativity in monomeric enzymes is characterized in terms of a non-Michaelis-Menten kinetic behaviour. The latter is believed to be associated with mechanisms that include multiple reaction pathways due to enzymatic conformational fluctuations. Recent advances in single-molecule fluorescence spectroscopy have provided new fundamental insights on the possible mechanisms underlying reactions catalyzed by fluctuating enzymes. Here, we present a bottom-up approach to understand enzyme turnover kinetics at physiologically relevant mesoscopic concentrations informed by mechanisms extracted from single-molecule stochastic trajectories. The stochastic approach, presented here, shows the emergence of dynamic co-operativity in terms of a slowing down of the Michaelis-Menten (MM) kinetics resulting in negative co-operativity. For fewer enzymes, dynamic co-operativity emerges due to the combined effects of enzymatic conformational fluctuations and molecular discreteness. The increase in the number of enzymes, however, suppresses the effect of enzymatic conformational fluctuations such that dynamic co-operativity emerges solely due to the discrete changes in the number of reacting species. These results confirm that the turnover kinetics of fluctuating enzyme based on the parallel-pathway MM mechanism switches over to the single-pathway MM mechanism with the increase in the number of enzymes. For large enzyme numbers, convergence to the exact MM equation occurs in the limit of very high substrate concentration as the stochastic kinetics approaches the deterministic behaviour.
Emergence of dynamic cooperativity in the stochastic kinetics of fluctuating enzymes.
Kumar, Ashutosh; Chatterjee, Sambarta; Nandi, Mintu; Dua, Arti
2016-08-28
Dynamic co-operativity in monomeric enzymes is characterized in terms of a non-Michaelis-Menten kinetic behaviour. The latter is believed to be associated with mechanisms that include multiple reaction pathways due to enzymatic conformational fluctuations. Recent advances in single-molecule fluorescence spectroscopy have provided new fundamental insights on the possible mechanisms underlying reactions catalyzed by fluctuating enzymes. Here, we present a bottom-up approach to understand enzyme turnover kinetics at physiologically relevant mesoscopic concentrations informed by mechanisms extracted from single-molecule stochastic trajectories. The stochastic approach, presented here, shows the emergence of dynamic co-operativity in terms of a slowing down of the Michaelis-Menten (MM) kinetics resulting in negative co-operativity. For fewer enzymes, dynamic co-operativity emerges due to the combined effects of enzymatic conformational fluctuations and molecular discreteness. The increase in the number of enzymes, however, suppresses the effect of enzymatic conformational fluctuations such that dynamic co-operativity emerges solely due to the discrete changes in the number of reacting species. These results confirm that the turnover kinetics of fluctuating enzyme based on the parallel-pathway MM mechanism switches over to the single-pathway MM mechanism with the increase in the number of enzymes. For large enzyme numbers, convergence to the exact MM equation occurs in the limit of very high substrate concentration as the stochastic kinetics approaches the deterministic behaviour.
Emergence of dynamic cooperativity in the stochastic kinetics of fluctuating enzymes
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kumar, Ashutosh; Chatterjee, Sambarta; Nandi, Mintu
Dynamic co-operativity in monomeric enzymes is characterized in terms of a non-Michaelis-Menten kinetic behaviour. The latter is believed to be associated with mechanisms that include multiple reaction pathways due to enzymatic conformational fluctuations. Recent advances in single-molecule fluorescence spectroscopy have provided new fundamental insights on the possible mechanisms underlying reactions catalyzed by fluctuating enzymes. Here, we present a bottom-up approach to understand enzyme turnover kinetics at physiologically relevant mesoscopic concentrations informed by mechanisms extracted from single-molecule stochastic trajectories. The stochastic approach, presented here, shows the emergence of dynamic co-operativity in terms of a slowing down of the Michaelis-Menten (MM) kineticsmore » resulting in negative co-operativity. For fewer enzymes, dynamic co-operativity emerges due to the combined effects of enzymatic conformational fluctuations and molecular discreteness. The increase in the number of enzymes, however, suppresses the effect of enzymatic conformational fluctuations such that dynamic co-operativity emerges solely due to the discrete changes in the number of reacting species. These results confirm that the turnover kinetics of fluctuating enzyme based on the parallel-pathway MM mechanism switches over to the single-pathway MM mechanism with the increase in the number of enzymes. For large enzyme numbers, convergence to the exact MM equation occurs in the limit of very high substrate concentration as the stochastic kinetics approaches the deterministic behaviour.« less
Pairwise Sequence Alignment Library
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jeff Daily, PNNL
2015-05-20
Vector extensions, such as SSE, have been part of the x86 CPU since the 1990s, with applications in graphics, signal processing, and scientific applications. Although many algorithms and applications can naturally benefit from automatic vectorization techniques, there are still many that are difficult to vectorize due to their dependence on irregular data structures, dense branch operations, or data dependencies. Sequence alignment, one of the most widely used operations in bioinformatics workflows, has a computational footprint that features complex data dependencies. The trend of widening vector registers adversely affects the state-of-the-art sequence alignment algorithm based on striped data layouts. Therefore, amore » novel SIMD implementation of a parallel scan-based sequence alignment algorithm that can better exploit wider SIMD units was implemented as part of the Parallel Sequence Alignment Library (parasail). Parasail features: Reference implementations of all known vectorized sequence alignment approaches. Implementations of Smith Waterman (SW), semi-global (SG), and Needleman Wunsch (NW) sequence alignment algorithms. Implementations across all modern CPU instruction sets including AVX2 and KNC. Language interfaces for C/C++ and Python.« less
NASA Technical Reports Server (NTRS)
Murray, G. W.; Bohning, O. D.; Kinoshita, R. Y.; Becker, F. J.
1979-01-01
The results are summarized of a program to demonstrate the feasibility of Bubble Domain Memory Technology as a mass memory medium for spacecraft applications. The design, fabrication and test of a partially populated 10 to the 8th power Bit Data Recorder using 100 Kbit serial bubble memory chips is described. Design tradeoffs, design approach and performance are discussed. This effort resulted in a 10 to the 8th power bit recorder with a volume of 858.6 cu in and a weight of 47.2 pounds. The recorder is plug reconfigurable, having the capability of operating as one, two or four independent serial channel recorders or as a single sixteen bit byte parallel input recorder. Data rates up to 1.2 Mb/s in a serial mode and 2.4 Mb/s in a parallel mode may be supported. Fabrication and test of the recorder demonstrated the basic feasibility of Bubble Domain Memory technology for such applications. Test results indicate the need for improvement in memory element operating temperature range and detector performance.
Associative architecture for image processing
NASA Astrophysics Data System (ADS)
Adar, Rutie; Akerib, Avidan
1997-09-01
This article presents a new generation in parallel processing architecture for real-time image processing. The approach is implemented in a real time image processor chip, called the XiumTM-2, based on combining a fully associative array which provides the parallel engine with a serial RISC core on the same die. The architecture is fully programmable and can be programmed to implement a wide range of color image processing, computer vision and media processing functions in real time. The associative part of the chip is based on patented pending methodology of Associative Computing Ltd. (ACL), which condenses 2048 associative processors, each of 128 'intelligent' bits. Each bit can be a processing bit or a memory bit. At only 33 MHz and 0.6 micron manufacturing technology process, the chip has a computational power of 3 billion ALU operations per second and 66 billion string search operations per second. The fully programmable nature of the XiumTM-2 chip enables developers to use ACL tools to write their own proprietary algorithms combined with existing image processing and analysis functions from ACL's extended set of libraries.
Conceptual Design and Optimal Power Control Strategy for AN Eco-Friendly Hybrid Vehicle
NASA Astrophysics Data System (ADS)
Nasiri, N. Mir; Chieng, Frederick T. A.
2011-06-01
This paper presents a new concept for a hybrid vehicle using a torque and speed splitting technique. It is implemented by the newly developed controller in combination with a two degree of freedom epicyclic gear transmission. This approach enables optimization of the power split between the less powerful electrical motor and more powerful engine while driving a car load. The power split is fundamentally a dual-energy integration mechanism as it is implemented by using the epicyclic gear transmission that has two inputs and one output for a proper power distribution. The developed power split control system manages the operation of both the inputs to have a known output with the condition of maintaining optimum operating efficiency of the internal combustion engine and electrical motor. This system has a huge potential as it is possible to integrate all the features of hybrid vehicle known to-date such as the regenerative braking system, series hybrid, parallel hybrid, series/parallel hybrid, and even complex hybrid (bidirectional). By using the new power split system it is possible to further reduce fuel consumption and increase overall efficiency.
Atoche, Alejandro Castillo; Castillo, Javier Vázquez
2012-01-01
A high-speed dual super-systolic core for reconstructive signal processing (SP) operations consists of a double parallel systolic array (SA) machine in which each processing element of the array is also conceptualized as another SA in a bit-level fashion. In this study, we addressed the design of a high-speed dual super-systolic array (SSA) core for the enhancement/reconstruction of remote sensing (RS) imaging of radar/synthetic aperture radar (SAR) sensor systems. The selected reconstructive SP algorithms are efficiently transformed in their parallel representation and then, they are mapped into an efficient high performance embedded computing (HPEC) architecture in reconfigurable Xilinx field programmable gate array (FPGA) platforms. As an implementation test case, the proposed approach was aggregated in a HW/SW co-design scheme in order to solve the nonlinear ill-posed inverse problem of nonparametric estimation of the power spatial spectrum pattern (SSP) from a remotely sensed scene. We show how such dual SSA core, drastically reduces the computational load of complex RS regularization techniques achieving the required real-time operational mode. PMID:22736964
Sentence alignment using feed forward neural network.
Fattah, Mohamed Abdel; Ren, Fuji; Kuroiwa, Shingo
2006-12-01
Parallel corpora have become an essential resource for work in multi lingual natural language processing. However, sentence aligned parallel corpora are more efficient than non-aligned parallel corpora for cross language information retrieval and machine translation applications. In this paper, we present a new approach to align sentences in bilingual parallel corpora based on feed forward neural network classifier. A feature parameter vector is extracted from the text pair under consideration. This vector contains text features such as length, punctuate score, and cognate score values. A set of manually prepared training data has been assigned to train the feed forward neural network. Another set of data was used for testing. Using this new approach, we could achieve an error reduction of 60% over length based approach when applied on English-Arabic parallel documents. Moreover this new approach is valid for any language pair and it is quite flexible approach since the feature parameter vector may contain more/less or different features than that we used in our system such as lexical match feature.
Robust recognition of handwritten numerals based on dual cooperative network
NASA Technical Reports Server (NTRS)
Lee, Sukhan; Choi, Yeongwoo
1992-01-01
An approach to robust recognition of handwritten numerals using two operating parallel networks is presented. The first network uses inputs in Cartesian coordinates, and the second network uses the same inputs transformed into polar coordinates. How the proposed approach realizes the robustness to local and global variations of input numerals by handling inputs both in Cartesian coordinates and in its transformed Polar coordinates is described. The required network structures and its learning scheme are discussed. Experimental results show that by tracking only a small number of distinctive features for each teaching numeral in each coordinate, the proposed system can provide robust recognition of handwritten numerals.
The transcondylar approach to craniocervical meningiomas.
Rassi, Marcio S; de Oliveira, Jean G; Borba, Luis A B
2017-10-01
Surgical removal of foramen magnum meningiomas poses great challenges due their deep location within the central skull base and their proximity to vital neurovascular structures. This video depicts the operative nuances of surgical management for a 59-year-old female who presented with a right-sided spinocranial meningioma. Simpson Grade I resection was achieved through a right transcondylar approach. The patient's postoperative period was unremarkable, and she was discharged home on postoperative Day 5 for periodic follow-up. The transcondylar approach safely exposes the craniocervical junction at the anterior aspect of the neuraxis and still allows the surgeon to access the tumor through a parallel plane, with minimum morbidity. The video can be found here: https://youtu.be/P0-kXjAkw9U .
A conservative approach to parallelizing the Sharks World simulation
NASA Technical Reports Server (NTRS)
Nicol, David M.; Riffe, Scott E.
1990-01-01
Parallelizing a benchmark problem for parallel simulation, the Sharks World, is described. The described solution is conservative, in the sense that no state information is saved, and no 'rollbacks' occur. The used approach illustrates both the principal advantage and principal disadvantage of conservative parallel simulation. The advantage is that by exploiting lookahead an approach was found that dramatically improves the serial execution time, and also achieves excellent speedups. The disadvantage is that if the model rules are changed in such a way that the lookahead is destroyed, it is difficult to modify the solution to accommodate the changes.
1987-11-01
The purpose of the workshop was to bring together people whose interests lie in the areas of operating I systems , programming languages, and formal... operating system support, and applications. There were parallel discussions on scheduling and distributed languages, and on real-time and operating ...number of key challenges: * Distributed systems , languages, environments - Make transactions efficient. Integrate them into the operating system
A Parallel Adaboost-Backpropagation Neural Network for Massive Image Dataset Classification
NASA Astrophysics Data System (ADS)
Cao, Jianfang; Chen, Lichao; Wang, Min; Shi, Hao; Tian, Yun
2016-12-01
Image classification uses computers to simulate human understanding and cognition of images by automatically categorizing images. This study proposes a faster image classification approach that parallelizes the traditional Adaboost-Backpropagation (BP) neural network using the MapReduce parallel programming model. First, we construct a strong classifier by assembling the outputs of 15 BP neural networks (which are individually regarded as weak classifiers) based on the Adaboost algorithm. Second, we design Map and Reduce tasks for both the parallel Adaboost-BP neural network and the feature extraction algorithm. Finally, we establish an automated classification model by building a Hadoop cluster. We use the Pascal VOC2007 and Caltech256 datasets to train and test the classification model. The results are superior to those obtained using traditional Adaboost-BP neural network or parallel BP neural network approaches. Our approach increased the average classification accuracy rate by approximately 14.5% and 26.0% compared to the traditional Adaboost-BP neural network and parallel BP neural network, respectively. Furthermore, the proposed approach requires less computation time and scales very well as evaluated by speedup, sizeup and scaleup. The proposed approach may provide a foundation for automated large-scale image classification and demonstrates practical value.
A Parallel Adaboost-Backpropagation Neural Network for Massive Image Dataset Classification.
Cao, Jianfang; Chen, Lichao; Wang, Min; Shi, Hao; Tian, Yun
2016-12-01
Image classification uses computers to simulate human understanding and cognition of images by automatically categorizing images. This study proposes a faster image classification approach that parallelizes the traditional Adaboost-Backpropagation (BP) neural network using the MapReduce parallel programming model. First, we construct a strong classifier by assembling the outputs of 15 BP neural networks (which are individually regarded as weak classifiers) based on the Adaboost algorithm. Second, we design Map and Reduce tasks for both the parallel Adaboost-BP neural network and the feature extraction algorithm. Finally, we establish an automated classification model by building a Hadoop cluster. We use the Pascal VOC2007 and Caltech256 datasets to train and test the classification model. The results are superior to those obtained using traditional Adaboost-BP neural network or parallel BP neural network approaches. Our approach increased the average classification accuracy rate by approximately 14.5% and 26.0% compared to the traditional Adaboost-BP neural network and parallel BP neural network, respectively. Furthermore, the proposed approach requires less computation time and scales very well as evaluated by speedup, sizeup and scaleup. The proposed approach may provide a foundation for automated large-scale image classification and demonstrates practical value.
A Parallel Adaboost-Backpropagation Neural Network for Massive Image Dataset Classification
Cao, Jianfang; Chen, Lichao; Wang, Min; Shi, Hao; Tian, Yun
2016-01-01
Image classification uses computers to simulate human understanding and cognition of images by automatically categorizing images. This study proposes a faster image classification approach that parallelizes the traditional Adaboost-Backpropagation (BP) neural network using the MapReduce parallel programming model. First, we construct a strong classifier by assembling the outputs of 15 BP neural networks (which are individually regarded as weak classifiers) based on the Adaboost algorithm. Second, we design Map and Reduce tasks for both the parallel Adaboost-BP neural network and the feature extraction algorithm. Finally, we establish an automated classification model by building a Hadoop cluster. We use the Pascal VOC2007 and Caltech256 datasets to train and test the classification model. The results are superior to those obtained using traditional Adaboost-BP neural network or parallel BP neural network approaches. Our approach increased the average classification accuracy rate by approximately 14.5% and 26.0% compared to the traditional Adaboost-BP neural network and parallel BP neural network, respectively. Furthermore, the proposed approach requires less computation time and scales very well as evaluated by speedup, sizeup and scaleup. The proposed approach may provide a foundation for automated large-scale image classification and demonstrates practical value. PMID:27905520
Soto-Quiros, Pablo
2015-01-01
This paper presents a parallel implementation of a kind of discrete Fourier transform (DFT): the vector-valued DFT. The vector-valued DFT is a novel tool to analyze the spectra of vector-valued discrete-time signals. This parallel implementation is developed in terms of a mathematical framework with a set of block matrix operations. These block matrix operations contribute to analysis, design, and implementation of parallel algorithms in multicore processors. In this work, an implementation and experimental investigation of the mathematical framework are performed using MATLAB with the Parallel Computing Toolbox. We found that there is advantage to use multicore processors and a parallel computing environment to minimize the high execution time. Additionally, speedup increases when the number of logical processors and length of the signal increase.
Jones, Ryan J. R.; Shinde, Aniketa; Guevarra, Dan; ...
2015-01-05
There are many energy technologies require electrochemical stability or preactivation of functional materials. Due to the long experiment duration required for either electrochemical preactivation or evaluation of operational stability, parallel screening is required to enable high throughput experimentation. We found that imposing operational electrochemical conditions to a library of materials in parallel creates several opportunities for experimental artifacts. We discuss the electrochemical engineering principles and operational parameters that mitigate artifacts int he parallel electrochemical treatment system. We also demonstrate the effects of resistive losses within the planar working electrode through a combination of finite element modeling and illustrative experiments. Operationmore » of the parallel-plate, membrane-separated electrochemical treatment system is demonstrated by exposing a composition library of mixed metal oxides to oxygen evolution conditions in 1M sulfuric acid for 2h. This application is particularly important because the electrolysis and photoelectrolysis of water are promising future energy technologies inhibited by the lack of highly active, acid-stable catalysts containing only earth abundant elements.« less
Limpanuparb, Taweetham; Milthorpe, Josh; Rendell, Alistair P
2014-10-30
Use of the modern parallel programming language X10 for computing long-range Coulomb and exchange interactions is presented. By using X10, a partitioned global address space language with support for task parallelism and the explicit representation of data locality, the resolution of the Ewald operator can be parallelized in a straightforward manner including use of both intranode and internode parallelism. We evaluate four different schemes for dynamic load balancing of integral calculation using X10's work stealing runtime, and report performance results for long-range HF energy calculation of large molecule/high quality basis running on up to 1024 cores of a high performance cluster machine. Copyright © 2014 Wiley Periodicals, Inc.
Archer, Charles J [Rochester, MN; Blocksome, Michael A [Rochester, MN; Peters, Amanda A [Rochester, MN; Ratterman, Joseph D [Rochester, MN; Smith, Brian E [Rochester, MN
2012-01-10
Methods, apparatus, and products are disclosed for reducing power consumption while synchronizing a plurality of compute nodes during execution of a parallel application that include: beginning, by each compute node, performance of a blocking operation specified by the parallel application, each compute node beginning the blocking operation asynchronously with respect to the other compute nodes; reducing, for each compute node, power to one or more hardware components of that compute node in response to that compute node beginning the performance of the blocking operation; and restoring, for each compute node, the power to the hardware components having power reduced in response to all of the compute nodes beginning the performance of the blocking operation.
Archer, Charles J [Rochester, MN; Blocksome, Michael A [Rochester, MN; Peters, Amanda E [Cambridge, MA; Ratterman, Joseph D [Rochester, MN; Smith, Brian E [Rochester, MN
2012-04-17
Methods, apparatus, and products are disclosed for reducing power consumption while synchronizing a plurality of compute nodes during execution of a parallel application that include: beginning, by each compute node, performance of a blocking operation specified by the parallel application, each compute node beginning the blocking operation asynchronously with respect to the other compute nodes; reducing, for each compute node, power to one or more hardware components of that compute node in response to that compute node beginning the performance of the blocking operation; and restoring, for each compute node, the power to the hardware components having power reduced in response to all of the compute nodes beginning the performance of the blocking operation.
2015-01-01
Implementing parallel and multivalued logic operations at the molecular scale has the potential to improve the miniaturization and efficiency of a new generation of nanoscale computing devices. Two-dimensional photon-echo spectroscopy is capable of resolving dynamical pathways on electronic and vibrational molecular states. We experimentally demonstrate the implementation of molecular decision trees, logic operations where all possible values of inputs are processed in parallel and the outputs are read simultaneously, by probing the laser-induced dynamics of populations and coherences in a rhodamine dye mounted on a short DNA duplex. The inputs are provided by the bilinear interactions between the molecule and the laser pulses, and the output values are read from the two-dimensional molecular response at specific frequencies. Our results highlights how ultrafast dynamics between multiple molecular states induced by light–matter interactions can be used as an advantage for performing complex logic operations in parallel, operations that are faster than electrical switching. PMID:25984269
Generalized Philosophy of Alerting with Applications for Parallel Approach Collision Prevention
NASA Technical Reports Server (NTRS)
Winder, Lee F.; Kuchar, James K.
2000-01-01
The goal of the research was to develop formal guidelines for the design of hazard avoidance systems. An alerting system is automation designed to reduce the likelihood of undesirable outcomes that are due to rare failures in a human-controlled system. It accomplishes this by monitoring the system, and issuing warning messages to the human operators when thought necessary to head off a problem. On examination of existing and recently proposed logics for alerting it appears that few commonly accepted principles guide the design process. Different logics intended to address the same hazards may take disparate forms and emphasize different aspects of performance, because each reflects the intuitive priorities of a different designer. Because performance must be satisfactory to all users of an alerting system (implying a universal meaning of acceptable performance) and not just one designer, a proposed logic often undergoes significant piecemeal modification before gamma general acceptance. This report is an initial attempt to clarify the common performance goals by which an alerting system is ultimately judged. A better understanding of these goals will hopefully allow designers to reach the final logic in a quicker, more direct and repeatable manner. As a case study, this report compares three alerting logics for collision prevention during independent approaches to parallel runways, and outlines a fourth alternative incorporating elements of the first three, but satisfying stated requirements. Three existing logics for parallel approach alerting are described. Each follows from different intuitive principles. The logics are presented as examples of three "philosophies" of alerting system design.
A distributed parallel storage architecture and its potential application within EOSDIS
NASA Technical Reports Server (NTRS)
Johnston, William E.; Tierney, Brian; Feuquay, Jay; Butzer, Tony
1994-01-01
We describe the architecture, implementation, use of a scalable, high performance, distributed-parallel data storage system developed in the ARPA funded MAGIC gigabit testbed. A collection of wide area distributed disk servers operate in parallel to provide logical block level access to large data sets. Operated primarily as a network-based cache, the architecture supports cooperation among independently owned resources to provide fast, large-scale, on-demand storage to support data handling, simulation, and computation.
Terminal Area Procedures for Paired Runways
NASA Technical Reports Server (NTRS)
Lozito, Sandra; Verma, Savita Arora
2011-01-01
Parallel runway operations have been found to increase capacity within the National Airspace but poor visibility conditions reduce the use of these operations. The NextGen and SESAR Programs have identified the capacity benefits from increased use of closely-space parallel runway. Previous research examined the concepts and procedures related to parallel runways however, there has been no investigation of the procedures associated with the strategic and tactical pairing of aircraft for these operations. This simulation study developed and examined the pilot and controller procedures and information requirements for creating aircraft pairs for parallel runway operations. The goal was to achieve aircraft pairing with a temporal separation of 15s (+/- 10s error) at a coupling point that was about 12 nmi from the runway threshold. Two variables were explored for the pilot participants: two levels of flight deck automation (current-day flight deck automation and auto speed control future automation) as well as two flight deck displays that assisted in pilot conformance monitoring. The controllers were also provided with automation to help create and maintain aircraft pairs. Results show the operations in this study were acceptable and safe. Subjective workload, when using the pairing procedures and tools, was generally low for both controllers and pilots, and situation awareness was typically moderate to high. Pilot workload was influenced by display type and automation condition. Further research on pairing and off-nominal conditions is required however, this investigation identified promising findings about the feasibility of closely-spaced parallel runway operations.
Parallel text rendering by a PostScript interpreter
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kritskii, S.P.; Zastavnoi, B.A.
1994-11-01
The most radical method of increasing the performance of devices controlled by PostScript interpreters may be the use of multiprocessor controllers. This paper presents a method for parallelizing the operation of a PostScript interpreter for rendering text. The proposed method is based on decomposition of the outlines of letters into horizontal strips covering equal areas. The subroutines thus obtained are distributed to the processors in a network and then filled in by conventional sequential algorithms. A special algorithm has been developed for dividing the outlines of characters into subroutines so that each may be colored independently of the others. Themore » algorithm uses special estimates for estimating the correct partition so that the corresponding outlines are divided into horizontal strips. A method is presented for finding such estimates. Two different processing approaches are presented. In the first, one of the processors performs the decomposition of the outlines and distributes the strips to the remaining processors, which are responsible for the rendering. In the second approach, the decomposition process is itself distributed among the processors in the network.« less
Systems-on-chip approach for real-time simulation of wheel-rail contact laws
NASA Astrophysics Data System (ADS)
Mei, T. X.; Zhou, Y. J.
2013-04-01
This paper presents the development of a systems-on-chip approach to speed up the simulation of wheel-rail contact laws, which can be used to reduce the requirement for high-performance computers and enable simulation in real time for the use of hardware-in-loop for experimental studies of the latest vehicle dynamic and control technologies. The wheel-rail contact laws are implemented using a field programmable gate array (FPGA) device with a design that substantially outperforms modern general-purpose PC platforms or fixed architecture digital signal processor devices in terms of processing time, configuration flexibility and cost. In order to utilise the FPGA's parallel-processing capability, the operations in the contact laws algorithms are arranged in a parallel manner and multi-contact patches are tackled simultaneously in the design. The interface between the FPGA device and the host PC is achieved by using a high-throughput and low-latency Ethernet link. The development is based on FASTSIM algorithms, although the design can be adapted and expanded for even more computationally demanding tasks.
An approach to the optical MSD adder
NASA Astrophysics Data System (ADS)
Takahashi, Hideya; Matsushita, Kenji; Shimizu, Eiji
1990-07-01
The intrinsic parallelism of optical elements for computation is presently taken fuller advantage of than heretofore possible through an optical implementation of the modified signed digit (MSD) number system, which yields carry-free addition and subtraction. In the present optical implementation of the MSD system, optical phase data are used to preclude negative value representation. Attention is given to an MSD adder array for addition operations on two n-digit trinary numbers; the output is composed of n + 1 trinary digits.
Notes on implementation of sparsely distributed memory
NASA Technical Reports Server (NTRS)
Keeler, J. D.; Denning, P. J.
1986-01-01
The Sparsely Distributed Memory (SDM) developed by Kanerva is an unconventional memory design with very interesting and desirable properties. The memory works in a manner that is closely related to modern theories of human memory. The SDM model is discussed in terms of its implementation in hardware. Two appendices discuss the unconventional approaches of the SDM: Appendix A treats a resistive circuit for fast, parallel address decoding; and Appendix B treats a systolic array for high throughput read and write operations.
NASA Technical Reports Server (NTRS)
Gore, Brian Francis; Hooey, Becky Lee; Haan, Nancy; Socash, Connie; Mahlstedt, Eric; Foyle, David C.
2013-01-01
The Closely Spaced Parallel Operations (CSPO) scenario is a complex, human performance model scenario that tested alternate operator roles and responsibilities to a series of off-nominal operations on approach and landing (see Gore, Hooey, Mahlstedt, Foyle, 2013). The model links together the procedures, equipment, crewstation, and external environment to produce predictions of operator performance in response to Next Generation system designs, like those expected in the National Airspaces NextGen concepts. The task analysis that is contained in the present report comes from the task analysis window in the MIDAS software. These tasks link definitions and states for equipment components, environmental features as well as operational contexts. The current task analysis culminated in 3300 tasks that included over 1000 Subject Matter Expert (SME)-vetted, re-usable procedural sets for three critical phases of flight; the Descent, Approach, and Land procedural sets (see Gore et al., 2011 for a description of the development of the tasks included in the model; Gore, Hooey, Mahlstedt, Foyle, 2013 for a description of the model, and its results; Hooey, Gore, Mahlstedt, Foyle, 2013 for a description of the guidelines that were generated from the models results; Gore, Hooey, Foyle, 2012 for a description of the models implementation and its settings). The rollout, after landing checks, taxi to gate and arrive at gate illustrated in Figure 1 were not used in the approach and divert scenarios exercised. The other networks in Figure 1 set up appropriate context settings for the flight deck.The current report presents the models task decomposition from the tophighest level and decomposes it to finer-grained levels. The first task that is completed by the model is to set all of the initial settings for the scenario runs included in the model (network 75 in Figure 1). This initialization process also resets the CAD graphic files contained with MIDAS, as well as the embedded operator models that comprise MIDAS. Following the initial settings, the model progresses to begin the first tasks required of the two flight deck operators, the Captain (CA) and the First Officer (FO). The task sets will initialize operator specific settings prior to loading all of the alerts, probes, and other events that occur in the scenario. As a note, the CA and FO were terms used in developing this model but the CA can also be thought of as the Pilot Flying (PF), while the FO can be considered the Pilot-Not-Flying (PNF)or Pilot Monitoring (PM). As such, the document refers to the operators as PFCA and PNFFO respectively.
Rapid code acquisition algorithms employing PN matched filters
NASA Technical Reports Server (NTRS)
Su, Yu T.
1988-01-01
The performance of four algorithms using pseudonoise matched filters (PNMFs), for direct-sequence spread-spectrum systems, is analyzed. They are: parallel search with fix dwell detector (PL-FDD), parallel search with sequential detector (PL-SD), parallel-serial search with fix dwell detector (PS-FDD), and parallel-serial search with sequential detector (PS-SD). The operation characteristic for each detector and the mean acquisition time for each algorithm are derived. All the algorithms are studied in conjunction with the noncoherent integration technique, which enables the system to operate in the presence of data modulation. Several previous proposals using PNMF are seen as special cases of the present algorithms.
Robinson, Thomas N; Jones, Edward L; Dunn, Christina L; Dunne, Bruce; Johnson, Elizabeth; Townsend, Nicole T; Paniccia, Alessandro; Stiegmann, Greg V
2015-06-01
The monopolar "Bovie" is used in virtually every laparoscopic operation. The active electrode and its cord emit radiofrequency energy that couples (or transfers) to nearby conductive material without direct contact. This phenomenon is increased when the active electrode cord is oriented parallel to another wire/cord. The parallel orientation of the "Bovie" and laparoscopic camera cords cause transfer of energy to the camera cord resulting in cutaneous burns at the camera trocar incision. We hypothesized that separating the active electrode/camera cords would reduce thermal injury occurring at the camera trocar incision in comparison to parallel oriented active electrode/camera cords. In this prospective, blinded, randomized controlled trial, patients undergoing standardized laparoscopic cholecystectomy were randomized to separated active electrode/camera cords or parallel oriented active electrode/camera cords. The primary outcome variable was thermal injury determined by histology from skin biopsied at the camera trocar incision. Eighty-four patients participated. Baseline demographics were similar in the groups for age, sex, preoperative diagnosis, operative time, and blood loss. Thermal injury at the camera trocar incision was lower in the separated versus parallel group (31% vs 57%; P = 0.027). Separation of the laparoscopic camera cord from the active electrode cord decreases thermal injury from antenna coupling at the camera trocar incision in comparison to the parallel orientation of these cords. Therefore, parallel orientation of these cords (an arrangement promoted by integrated operating rooms) should be abandoned. The findings of this study should influence the operating room setup for all laparoscopic cases.
NASA Astrophysics Data System (ADS)
Khoruzhnikov, S. E.; Grudinin, V. A.; Sadov, O. L.; Shevel, A. E.; Titov, V. B.; Kairkanov, A. B.
2015-04-01
The transfer of Big Data over a computer network is an important and unavoidable operation in the past, present, and in any feasible future. A large variety of astronomical projects produces the Big Data. There are a number of methods to transfer the data over a global computer network (Internet) with a range of tools. In this paper we consider the transfer of one piece of Big Data from one point in the Internet to another, in general over a long-range distance: many thousand kilometers. Several free of charge systems to transfer the Big Data are analyzed here. The most important architecture features are emphasized, and the idea is discussed to add the SDN OpenFlow protocol technique for fine-grain tuning of the data transfer process over several parallel data links.
NASA Astrophysics Data System (ADS)
Bolis, A.; Cantwell, C. D.; Moxey, D.; Serson, D.; Sherwin, S. J.
2016-09-01
A hybrid parallelisation technique for distributed memory systems is investigated for a coupled Fourier-spectral/hp element discretisation of domains characterised by geometric homogeneity in one or more directions. The performance of the approach is mathematically modelled in terms of operation count and communication costs for identifying the most efficient parameter choices. The model is calibrated to target a specific hardware platform after which it is shown to accurately predict the performance in the hybrid regime. The method is applied to modelling turbulent flow using the incompressible Navier-Stokes equations in an axisymmetric pipe and square channel. The hybrid method extends the practical limitations of the discretisation, allowing greater parallelism and reduced wall times. Performance is shown to continue to scale when both parallelisation strategies are used.
[PVFS 2000: An operational parallel file system for Beowulf
NASA Technical Reports Server (NTRS)
Ligon, Walt
2004-01-01
The approach has been to develop Parallel Virtual File System version 2 (PVFS2) , retaining the basic philosophy of the original file system but completely rewriting the code. It shows the architecture of the server and client components. BMI - BMI is the network abstraction layer. It is designed with a common driver and modules for each protocol supported. The interface is non-blocking, and provides mechanisms for optimizations including pinning user buffers. Currently TCP/IP and GM(Myrinet) modules have been implemented. Trove -Trove is the storage abstraction layer. It provides for storing both data spaces and name/value pairs. Trove can also be implemented using different underlying storage mechanisms including native files, raw disk partitions, SQL and other databases. The current implementation uses native files for data spaces and Berkeley db for name/value pairs.
NASA Astrophysics Data System (ADS)
Sun, Degui; Wang, Na-Xin; He, Li-Ming; Weng, Zhao-Heng; Wang, Daheng; Chen, Ray T.
1996-06-01
A space-position-logic-encoding scheme is proposed and demonstrated. This encoding scheme not only makes the best use of the convenience of binary logic operation, but is also suitable for the trinary property of modified signed- digit (MSD) numbers. Based on the space-position-logic-encoding scheme, a fully parallel modified signed-digit adder and subtractor is built using optoelectronic switch technologies in conjunction with fiber-multistage 3D optoelectronic interconnects. Thus an effective combination of a parallel algorithm and a parallel architecture is implemented. In addition, the performance of the optoelectronic switches used in this system is experimentally studied and verified. Both the 3-bit experimental model and the experimental results of a parallel addition and a parallel subtraction are provided and discussed. Finally, the speed ratio between the MSD adder and binary adders is discussed and the advantage of the MSD in operating speed is demonstrated.
Coupled ridge waveguide distributed feedback quantum cascade laser arrays
DOE Office of Scientific and Technical Information (OSTI.GOV)
Liu, Ying-Hui; Zhang, Jin-Chuan, E-mail: zhangjinchuan@semi.ac.cn; Yan, Fang-Liang
2015-04-06
A coupled ridge waveguide quantum cascade laser (QCL) array consisting of fifteen elements with parallel integration was presented. In-phase fundamental mode operation in each element is secured by both the index-guided nature of the ridge and delicate loss management by properly designed geometries of the ridges and interspaces. Single-lobe lateral far-field with a nearly diffraction limited beam pattern was obtained. By incorporating a one-dimensional buried distributed feedback grating, the in-phase-operating coupled ridge waveguide QCL design provides an efficient solution to obtaining high output power and stable single longitudinal mode emission. The simplicity of this structure and fabrication process makes thismore » approach attractive to many practical applications.« less
High-power microwave generation using optically activated semiconductor switches
NASA Astrophysics Data System (ADS)
Nunnally, William C.
1990-12-01
The two prominent types of optically controlled switches, the optically controlled linear (OCL) switch and the optically initiated avalanche (OIA) switch, are described, and their operating parameters are characterized. Two transmission line approaches, one using a frozen-wave generator and the other using an injected-wave generator, for generation of multiple cycles of high-power microwave energy using optically controlled switches are discussed. The point design performances of the series-switch, frozen-wave generator and the parallel-switch, injected-wave generator are compared. The operating and performance limitations of the optically controlled switch types are discussed, and additional research needed to advance the development of the optically controlled, bulk, semiconductor switches is indicated.
A MEMS approach to determine the biochemical oxygen demand (BOD) of wastewaters
NASA Astrophysics Data System (ADS)
Recoules, L.; Migaou, A.; Dollat, X.; Thouand, G.; Gue, A. M.; Boukabache, A.
2017-07-01
A MEMS approach to obtain an efficient tool for the evaluation of the biochemical oxygen demand (BOD) of wastewaters is introduced. Its operating principle is based on the measurement of oxygen concentration in water samples containing organic pollutants and specific bacteria. The microsystem has been designed to perform multiple and parallel measurements in a poly-wells microfluidic device. The monitoring of the bacterial activity is ensured by optical sensors incorporated in each well of the fluidic network. By using an optode sensor, it is hereby demonstrated that this approach is efficient to measure organic pollutants by testing different Luria Bertani buffer dilutions. These results also show that it is possible to reduce the duration of measurements from 5 d (BOD5) of the standard approach to few hours, typically 3 h-5 h.
Three paths toward the quantum angle operator
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gazeau, Jean Pierre, E-mail: gazeau@apc.univ-paris7.fr; Szafraniec, Franciszek Hugon, E-mail: franciszek.szafraniec@uj.edu.pl
2016-12-15
We examine mathematical questions around angle (or phase) operator associated with a number operator through a short list of basic requirements. We implement three methods of construction of quantum angle. The first one is based on operator theory and parallels the definition of angle for the upper half-circle through its cosine and completed by a sign inversion. The two other methods are integral quantization generalizing in a certain sense the Berezin–Klauder approaches. One method pertains to Weyl–Heisenberg integral quantization of the plane viewed as the phase space of the motion on the line. It depends on a family of “weight”more » functions on the plane. The third method rests upon coherent state quantization of the cylinder viewed as the phase space of the motion on the circle. The construction of these coherent states depends on a family of probability distributions on the line.« less
Parallel CARLOS-3D code development
DOE Office of Scientific and Technical Information (OSTI.GOV)
Putnam, J.M.; Kotulski, J.D.
1996-02-01
CARLOS-3D is a three-dimensional scattering code which was developed under the sponsorship of the Electromagnetic Code Consortium, and is currently used by over 80 aerospace companies and government agencies. The code has been extensively validated and runs on both serial workstations and parallel super computers such as the Intel Paragon. CARLOS-3D is a three-dimensional surface integral equation scattering code based on a Galerkin method of moments formulation employing Rao- Wilton-Glisson roof-top basis for triangular faceted surfaces. Fully arbitrary 3D geometries composed of multiple conducting and homogeneous bulk dielectric materials can be modeled. This presentation describes some of the extensions tomore » the CARLOS-3D code, and how the operator structure of the code facilitated these improvements. Body of revolution (BOR) and two-dimensional geometries were incorporated by simply including new input routines, and the appropriate Galerkin matrix operator routines. Some additional modifications were required in the combined field integral equation matrix generation routine due to the symmetric nature of the BOR and 2D operators. Quadrilateral patched surfaces with linear roof-top basis functions were also implemented in the same manner. Quadrilateral facets and triangular facets can be used in combination to more efficiently model geometries with both large smooth surfaces and surfaces with fine detail such as gaps and cracks. Since the parallel implementation in CARLOS-3D is at high level, these changes were independent of the computer platform being used. This approach minimizes code maintenance, while providing capabilities with little additional effort. Results are presented showing the performance and accuracy of the code for some large scattering problems. Comparisons between triangular faceted and quadrilateral faceted geometry representations will be shown for some complex scatterers.« less
NASA Technical Reports Server (NTRS)
Stupl, Jan; Faber, Nicolas; Foster, Cyrus; Yang, Fan Yang; Nelson, Bron; Aziz, Jonathan; Nuttall, Andrew; Henze, Chris; Levit, Creon
2014-01-01
This paper provides an updated efficiency analysis of the LightForce space debris collision avoidance scheme. LightForce aims to prevent collisions on warning by utilizing photon pressure from ground based, commercial off the shelf lasers. Past research has shown that a few ground-based systems consisting of 10 kilowatt class lasers directed by 1.5 meter telescopes with adaptive optics could lower the expected number of collisions in Low Earth Orbit (LEO) by an order of magnitude. Our simulation approach utilizes the entire Two Line Element (TLE) catalogue in LEO for a given day as initial input. Least-squares fitting of a TLE time series is used for an improved orbit estimate. We then calculate the probability of collision for all LEO objects in the catalogue for a time step of the simulation. The conjunctions that exceed a threshold probability of collision are then engaged by a simulated network of laser ground stations. After those engagements, the perturbed orbits are used to re-assess the probability of collision and evaluate the efficiency of the system. This paper describes new simulations with three updated aspects: 1) By utilizing a highly parallel simulation approach employing hundreds of processors, we have extended our analysis to a much broader dataset. The simulation time is extended to one year. 2) We analyze not only the efficiency of LightForce on conjunctions that naturally occur, but also take into account conjunctions caused by orbit perturbations due to LightForce engagements. 3) We use a new simulation approach that is regularly updating the LightForce engagement strategy, as it would be during actual operations. In this paper we present our simulation approach to parallelize the efficiency analysis, its computational performance and the resulting expected efficiency of the LightForce collision avoidance system. Results indicate that utilizing a network of four LightForce stations with 20 kilowatt lasers, 85% of all conjunctions with a probability of collision Pc > 10 (sup -6) can be mitigated.
ARMD Strategic Thrust 6: Assured Autonomy for Aviation Transformation
NASA Technical Reports Server (NTRS)
Ballin, Mark; Holbrook, Jon; Sharma, Shivanjli
2016-01-01
In collaboration with the external community and other government agencies, NASA will develop enabling technologies, standards, and design guidelines to support cost-effective applications of automation and limited autonomy for individual components of aviation systems. NASA will also provide foundational knowledge and methods to support the next epoch. Research will address issues of verification and validation, operational evaluation, national policy, and societal cost-benefit. Two research and development approaches to aviation autonomy will advance in parallel. The Increasing Autonomy (IA) approach will seek to advance knowledge and technology through incremental increases in machine-based support of existing human-centered tasks, leading to long-term reallocation of functions between humans and machines. The Autonomy as a New Technology (ANT) approach seeks advances by developing technology to achieve goals that are not currently possible using human-centered concepts of operation. IA applications are mission-enhancing, and their selection will be based on benefits achievable relative to existing operations. ANT applications are mission-enabling, and their value will be assessed based on societal benefit resulting from a new capability. The expected demand for small autonomous unmanned aircraft systems (UAS) provides an opportunity for development of ANT applications. Supervisory autonomy may be implemented as an expansion of the number of functions or systems that may be controlled by an individual human operator. Convergent technology approaches, such as the use of electronic flight bags and existing network servers, will be leveraged to the maximum extent possible.
Two Years of International Cooperation on Conjunction Mitigation
NASA Astrophysics Data System (ADS)
Kelso, T. S.
2010-09-01
In an effort to mitigate the risks associated with satellite close approaches in the geostationary belt, several satellite operators came together in early 2008 to establish what is now known as the GEO Data Center. The GEO Data Center initially provided a framework for satellite operators to share orbital data for their fleets of satellites to be used to perform conjunction analysis and provide automated notification of close approaches via the SOCRATES-GEO system. After two years of operations, the GEO Data Center now has 14 members providing data for 186 satellites. Since the Iridium 33-Cosmos 2251 collision, a parallel system was set up with a LEO Data Center, which already has seven members providing data for 101 satellites. These data centers have already shown the significant benefit of sharing orbital data, particularly in terms of reducing positional uncertainty and, thereby, the number of false alarms. This paper will address the current framework for these efforts, highlighting how a service-oriented architecture is used to support orbital operations and increase efficiency of analysis and resolution of risk-mitigation tasks. It will show how the interactive work flow is used to quickly assess new maneuvers to determine whether they have successfully reduced the chances of a particular close approach without causing other close approaches elsewhere. It will also show how independent space situational awareness organizations can be employed to provide a more complete picture of the threat from nonparticipating satellites and the debris population.
Adaptive parallel logic networks
NASA Technical Reports Server (NTRS)
Martinez, Tony R.; Vidal, Jacques J.
1988-01-01
Adaptive, self-organizing concurrent systems (ASOCS) that combine self-organization with massive parallelism for such applications as adaptive logic devices, robotics, process control, and system malfunction management, are presently discussed. In ASOCS, an adaptive network composed of many simple computing elements operating in combinational and asynchronous fashion is used and problems are specified by presenting if-then rules to the system in the form of Boolean conjunctions. During data processing, which is a different operational phase from adaptation, the network acts as a parallel hardware circuit.
Hypercluster Parallel Processor
NASA Technical Reports Server (NTRS)
Blech, Richard A.; Cole, Gary L.; Milner, Edward J.; Quealy, Angela
1992-01-01
Hypercluster computer system includes multiple digital processors, operation of which coordinated through specialized software. Configurable according to various parallel-computing architectures of shared-memory or distributed-memory class, including scalar computer, vector computer, reduced-instruction-set computer, and complex-instruction-set computer. Designed as flexible, relatively inexpensive system that provides single programming and operating environment within which one can investigate effects of various parallel-computing architectures and combinations on performance in solution of complicated problems like those of three-dimensional flows in turbomachines. Hypercluster software and architectural concepts are in public domain.
Brian Hears: Online Auditory Processing Using Vectorization Over Channels
Fontaine, Bertrand; Goodman, Dan F. M.; Benichoux, Victor; Brette, Romain
2011-01-01
The human cochlea includes about 3000 inner hair cells which filter sounds at frequencies between 20 Hz and 20 kHz. This massively parallel frequency analysis is reflected in models of auditory processing, which are often based on banks of filters. However, existing implementations do not exploit this parallelism. Here we propose algorithms to simulate these models by vectorizing computation over frequency channels, which are implemented in “Brian Hears,” a library for the spiking neural network simulator package “Brian.” This approach allows us to use high-level programming languages such as Python, because with vectorized operations, the computational cost of interpretation represents a small fraction of the total cost. This makes it possible to define and simulate complex models in a simple way, while all previous implementations were model-specific. In addition, we show that these algorithms can be naturally parallelized using graphics processing units, yielding substantial speed improvements. We demonstrate these algorithms with several state-of-the-art cochlear models, and show that they compare favorably with existing, less flexible, implementations. PMID:21811453
NASA Astrophysics Data System (ADS)
Elbaz, Reouven; Torres, Lionel; Sassatelli, Gilles; Guillemin, Pierre; Bardouillet, Michel; Martinez, Albert
The bus between the System on Chip (SoC) and the external memory is one of the weakest points of computer systems: an adversary can easily probe this bus in order to read private data (data confidentiality concern) or to inject data (data integrity concern). The conventional way to protect data against such attacks and to ensure data confidentiality and integrity is to implement two dedicated engines: one performing data encryption and another data authentication. This approach, while secure, prevents parallelizability of the underlying computations. In this paper, we introduce the concept of Block-Level Added Redundancy Explicit Authentication (BL-AREA) and we describe a Parallelized Encryption and Integrity Checking Engine (PE-ICE) based on this concept. BL-AREA and PE-ICE have been designed to provide an effective solution to ensure both security services while allowing for full parallelization on processor read and write operations and optimizing the hardware resources. Compared to standard encryption which ensures only confidentiality, we show that PE-ICE additionally guarantees code and data integrity for less than 4% of run-time performance overhead.
Message Passing and Shared Address Space Parallelism on an SMP Cluster
NASA Technical Reports Server (NTRS)
Shan, Hongzhang; Singh, Jaswinder P.; Oliker, Leonid; Biswas, Rupak; Biegel, Bryan (Technical Monitor)
2002-01-01
Currently, message passing (MP) and shared address space (SAS) are the two leading parallel programming paradigms. MP has been standardized with MPI, and is the more common and mature approach; however, code development can be extremely difficult, especially for irregularly structured computations. SAS offers substantial ease of programming, but may suffer from performance limitations due to poor spatial locality and high protocol overhead. In this paper, we compare the performance of and the programming effort required for six applications under both programming models on a 32-processor PC-SMP cluster, a platform that is becoming increasingly attractive for high-end scientific computing. Our application suite consists of codes that typically do not exhibit scalable performance under shared-memory programming due to their high communication-to-computation ratios and/or complex communication patterns. Results indicate that SAS can achieve about half the parallel efficiency of MPI for most of our applications, while being competitive for the others. A hybrid MPI+SAS strategy shows only a small performance advantage over pure MPI in some cases. Finally, improved implementations of two MPI collective operations on PC-SMP clusters are presented.
46 CFR 111.12-7 - Voltage regulation and parallel operation.
Code of Federal Regulations, 2013 CFR
2013-10-01
... 46 Shipping 4 2013-10-01 2013-10-01 false Voltage regulation and parallel operation. 111.12-7 Section 111.12-7 Shipping COAST GUARD, DEPARTMENT OF HOMELAND SECURITY (CONTINUED) ELECTRICAL ENGINEERING ELECTRIC SYSTEMS-GENERAL REQUIREMENTS Generator Construction and Circuits § 111.12-7 Voltage regulation and...
46 CFR 111.12-7 - Voltage regulation and parallel operation.
Code of Federal Regulations, 2014 CFR
2014-10-01
... 46 Shipping 4 2014-10-01 2014-10-01 false Voltage regulation and parallel operation. 111.12-7 Section 111.12-7 Shipping COAST GUARD, DEPARTMENT OF HOMELAND SECURITY (CONTINUED) ELECTRICAL ENGINEERING ELECTRIC SYSTEMS-GENERAL REQUIREMENTS Generator Construction and Circuits § 111.12-7 Voltage regulation and...
46 CFR 111.12-7 - Voltage regulation and parallel operation.
Code of Federal Regulations, 2012 CFR
2012-10-01
... 46 Shipping 4 2012-10-01 2012-10-01 false Voltage regulation and parallel operation. 111.12-7 Section 111.12-7 Shipping COAST GUARD, DEPARTMENT OF HOMELAND SECURITY (CONTINUED) ELECTRICAL ENGINEERING ELECTRIC SYSTEMS-GENERAL REQUIREMENTS Generator Construction and Circuits § 111.12-7 Voltage regulation and...
46 CFR 111.12-7 - Voltage regulation and parallel operation.
Code of Federal Regulations, 2011 CFR
2011-10-01
... 46 Shipping 4 2011-10-01 2011-10-01 false Voltage regulation and parallel operation. 111.12-7 Section 111.12-7 Shipping COAST GUARD, DEPARTMENT OF HOMELAND SECURITY (CONTINUED) ELECTRICAL ENGINEERING ELECTRIC SYSTEMS-GENERAL REQUIREMENTS Generator Construction and Circuits § 111.12-7 Voltage regulation and...
The Goddard Space Flight Center Program to develop parallel image processing systems
NASA Technical Reports Server (NTRS)
Schaefer, D. H.
1972-01-01
Parallel image processing which is defined as image processing where all points of an image are operated upon simultaneously is discussed. Coherent optical, noncoherent optical, and electronic methods are considered parallel image processing techniques.
Parallel architectures for iterative methods on adaptive, block structured grids
NASA Technical Reports Server (NTRS)
Gannon, D.; Vanrosendale, J.
1983-01-01
A parallel computer architecture well suited to the solution of partial differential equations in complicated geometries is proposed. Algorithms for partial differential equations contain a great deal of parallelism. But this parallelism can be difficult to exploit, particularly on complex problems. One approach to extraction of this parallelism is the use of special purpose architectures tuned to a given problem class. The architecture proposed here is tuned to boundary value problems on complex domains. An adaptive elliptic algorithm which maps effectively onto the proposed architecture is considered in detail. Two levels of parallelism are exploited by the proposed architecture. First, by making use of the freedom one has in grid generation, one can construct grids which are locally regular, permitting a one to one mapping of grids to systolic style processor arrays, at least over small regions. All local parallelism can be extracted by this approach. Second, though there may be a regular global structure to the grids constructed, there will be parallelism at this level. One approach to finding and exploiting this parallelism is to use an architecture having a number of processor clusters connected by a switching network. The use of such a network creates a highly flexible architecture which automatically configures to the problem being solved.
Application of parallelized software architecture to an autonomous ground vehicle
NASA Astrophysics Data System (ADS)
Shakya, Rahul; Wright, Adam; Shin, Young Ho; Momin, Orko; Petkovsek, Steven; Wortman, Paul; Gautam, Prasanna; Norton, Adam
2011-01-01
This paper presents improvements made to Q, an autonomous ground vehicle designed to participate in the Intelligent Ground Vehicle Competition (IGVC). For the 2010 IGVC, Q was upgraded with a new parallelized software architecture and a new vision processor. Improvements were made to the power system reducing the number of batteries required for operation from six to one. In previous years, a single state machine was used to execute the bulk of processing activities including sensor interfacing, data processing, path planning, navigation algorithms and motor control. This inefficient approach led to poor software performance and made it difficult to maintain or modify. For IGVC 2010, the team implemented a modular parallel architecture using the National Instruments (NI) LabVIEW programming language. The new architecture divides all the necessary tasks - motor control, navigation, sensor data collection, etc. into well-organized components that execute in parallel, providing considerable flexibility and facilitating efficient use of processing power. Computer vision is used to detect white lines on the ground and determine their location relative to the robot. With the new vision processor and some optimization of the image processing algorithm used last year, two frames can be acquired and processed in 70ms. With all these improvements, Q placed 2nd in the autonomous challenge.
Mine Hoist Operator Training System. Phase I Report.
1978-11-01
Bodies of Knowledge Function Control speed of conveyances Hold conveyances in position Structural Components Types of brakes : * Disc * Drum - Jaw...Parallel motion Components of each type * Disc / drum * Pads/shoes * Operating mechanisms Operating mediums for braking * Hydraulic/pneumatic * Manual...SHAFT GUIDES Wood El BRAKES Steel Rails El Drum : Wire Rope: Jaw El Full Lock El Parallel Motion El Half Lock El Disc El LEVELS DRIVE MOTORS Single El
An Island Grouping Genetic Algorithm for Fuzzy Partitioning Problems
Salcedo-Sanz, S.; Del Ser, J.; Geem, Z. W.
2014-01-01
This paper presents a novel fuzzy clustering technique based on grouping genetic algorithms (GGAs), which are a class of evolutionary algorithms especially modified to tackle grouping problems. Our approach hinges on a GGA devised for fuzzy clustering by means of a novel encoding of individuals (containing elements and clusters sections), a new fitness function (a superior modification of the Davies Bouldin index), specially tailored crossover and mutation operators, and the use of a scheme based on a local search and a parallelization process, inspired from an island-based model of evolution. The overall performance of our approach has been assessed over a number of synthetic and real fuzzy clustering problems with different objective functions and distance measures, from which it is concluded that the proposed approach shows excellent performance in all cases. PMID:24977235
Knepper, Andreas; Heiser, Michael; Glauche, Florian; Neubauer, Peter
2014-12-01
The enormous variation possibilities of bioprocesses challenge process development to fix a commercial process with respect to costs and time. Although some cultivation systems and some devices for unit operations combine the latest technology on miniaturization, parallelization, and sensing, the degree of automation in upstream and downstream bioprocess development is still limited to single steps. We aim to face this challenge by an interdisciplinary approach to significantly shorten development times and costs. As a first step, we scaled down analytical assays to the microliter scale and created automated procedures for starting the cultivation and monitoring the optical density (OD), pH, concentrations of glucose and acetate in the culture medium, and product formation in fed-batch cultures in the 96-well format. Then, the separate measurements of pH, OD, and concentrations of acetate and glucose were combined to one method. This method enables automated process monitoring at dedicated intervals (e.g., also during the night). By this approach, we managed to increase the information content of cultivations in 96-microwell plates, thus turning them into a suitable tool for high-throughput bioprocess development. Here, we present the flowcharts as well as cultivation data of our automation approach. © 2014 Society for Laboratory Automation and Screening.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pratapa, Phanisri P.; Suryanarayana, Phanish; Pask, John E.
We present the Clenshaw–Curtis Spectral Quadrature (SQ) method for real-space O(N) Density Functional Theory (DFT) calculations. In this approach, all quantities of interest are expressed as bilinear forms or sums over bilinear forms, which are then approximated by spatially localized Clenshaw–Curtis quadrature rules. This technique is identically applicable to both insulating and metallic systems, and in conjunction with local reformulation of the electrostatics, enables the O(N) evaluation of the electronic density, energy, and atomic forces. The SQ approach also permits infinite-cell calculations without recourse to Brillouin zone integration or large supercells. We employ a finite difference representation in order tomore » exploit the locality of electronic interactions in real space, enable systematic convergence, and facilitate large-scale parallel implementation. In particular, we derive expressions for the electronic density, total energy, and atomic forces that can be evaluated in O(N) operations. We demonstrate the systematic convergence of energies and forces with respect to quadrature order as well as truncation radius to the exact diagonalization result. In addition, we show convergence with respect to mesh size to established O(N 3) planewave results. In conclusion, we establish the efficiency of the proposed approach for high temperature calculations and discuss its particular suitability for large-scale parallel computation.« less
Pratapa, Phanisri P.; Suryanarayana, Phanish; Pask, John E.
2015-12-02
We present the Clenshaw–Curtis Spectral Quadrature (SQ) method for real-space O(N) Density Functional Theory (DFT) calculations. In this approach, all quantities of interest are expressed as bilinear forms or sums over bilinear forms, which are then approximated by spatially localized Clenshaw–Curtis quadrature rules. This technique is identically applicable to both insulating and metallic systems, and in conjunction with local reformulation of the electrostatics, enables the O(N) evaluation of the electronic density, energy, and atomic forces. The SQ approach also permits infinite-cell calculations without recourse to Brillouin zone integration or large supercells. We employ a finite difference representation in order tomore » exploit the locality of electronic interactions in real space, enable systematic convergence, and facilitate large-scale parallel implementation. In particular, we derive expressions for the electronic density, total energy, and atomic forces that can be evaluated in O(N) operations. We demonstrate the systematic convergence of energies and forces with respect to quadrature order as well as truncation radius to the exact diagonalization result. In addition, we show convergence with respect to mesh size to established O(N 3) planewave results. In conclusion, we establish the efficiency of the proposed approach for high temperature calculations and discuss its particular suitability for large-scale parallel computation.« less
Experimental characterization of a binary actuated parallel manipulator
NASA Astrophysics Data System (ADS)
Giuseppe, Carbone
2016-05-01
This paper describes the BAPAMAN (Binary Actuated Parallel MANipulator) series of parallel manipulators that has been conceived at Laboratory of Robotics and Mechatronics (LARM). Basic common characteristics of BAPAMAN series are described. In particular, it is outlined the use of a reduced number of active degrees of freedom, the use of design solutions with flexural joints and Shape Memory Alloy (SMA) actuators for achieving miniaturization, cost reduction and easy operation features. Given the peculiarities of BAPAMAN architecture, specific experimental tests have been proposed and carried out with the aim to validate the proposed design and to evaluate the practical operation performance and the characteristics of a built prototype, in particular, in terms of operation and workspace characteristics.
Mathematical and Numerical Aspects of the Adaptive Fast Multipole Poisson-Boltzmann Solver
Zhang, Bo; Lu, Benzhuo; Cheng, Xiaolin; ...
2013-01-01
This paper summarizes the mathematical and numerical theories and computational elements of the adaptive fast multipole Poisson-Boltzmann (AFMPB) solver. We introduce and discuss the following components in order: the Poisson-Boltzmann model, boundary integral equation reformulation, surface mesh generation, the nodepatch discretization approach, Krylov iterative methods, the new version of fast multipole methods (FMMs), and a dynamic prioritization technique for scheduling parallel operations. For each component, we also remark on feasible approaches for further improvements in efficiency, accuracy and applicability of the AFMPB solver to large-scale long-time molecular dynamics simulations. Lastly, the potential of the solver is demonstrated with preliminary numericalmore » results.« less
A time-parallel approach to strong-constraint four-dimensional variational data assimilation
NASA Astrophysics Data System (ADS)
Rao, Vishwas; Sandu, Adrian
2016-05-01
A parallel-in-time algorithm based on an augmented Lagrangian approach is proposed to solve four-dimensional variational (4D-Var) data assimilation problems. The assimilation window is divided into multiple sub-intervals that allows parallelization of cost function and gradient computations. The solutions to the continuity equations across interval boundaries are added as constraints. The augmented Lagrangian approach leads to a different formulation of the variational data assimilation problem than the weakly constrained 4D-Var. A combination of serial and parallel 4D-Vars to increase performance is also explored. The methodology is illustrated on data assimilation problems involving the Lorenz-96 and the shallow water models.
Exploring types of play in an adapted robotics program for children with disabilities.
Lindsay, Sally; Lam, Ashley
2018-04-01
Play is an important occupation in a child's development. Children with disabilities often have fewer opportunities to engage in meaningful play than typically developing children. The purpose of this study was to explore the types of play (i.e., solitary, parallel and co-operative) within an adapted robotics program for children with disabilities aged 6-8 years. This study draws on detailed observations of each of the six robotics workshops and interviews with 53 participants (21 children, 21 parents and 11 programme staff). Our findings showed that four children engaged in solitary play, where all but one showed signs of moving towards parallel play. Six children demonstrated parallel play during all workshops. The remainder of the children had mixed play types play (solitary, parallel and/or co-operative) throughout the robotics workshops. We observed more parallel and co-operative, and less solitary play as the programme progressed. Ten different children displayed co-operative behaviours throughout the workshops. The interviews highlighted how staff supported children's engagement in the programme. Meanwhile, parents reported on their child's development of play skills. An adapted LEGO ® robotics program has potential to develop the play skills of children with disabilities in moving from solitary towards more parallel and co-operative play. Implications for rehabilitation Educators and clinicians working with children who have disabilities should consider the potential of LEGO ® robotics programs for developing their play skills. Clinicians should consider how the extent of their involvement in prompting and facilitating children's engagement and play within a robotics program may influence their ability to interact with their peers. Educators and clinicians should incorporate both structured and unstructured free-play elements within a robotics program to facilitate children's social development.
Parallel approach in RDF query processing
NASA Astrophysics Data System (ADS)
Vajgl, Marek; Parenica, Jan
2017-07-01
Parallel approach is nowadays a very cheap solution to increase computational power due to possibility of usage of multithreaded computational units. This hardware became typical part of nowadays personal computers or notebooks and is widely spread. This contribution deals with experiments how evaluation of computational complex algorithm of the inference over RDF data can be parallelized over graphical cards to decrease computational time.
Development of a Wake Vortex Spacing System for Airport Capacity Enhancement and Delay Reduction
NASA Technical Reports Server (NTRS)
Hinton, David A.; OConnor, Cornelius J.
2000-01-01
The Terminal Area Productivity project has developed the technologies required (weather measurement, wake prediction, and wake measurement) to determine the aircraft spacing needed to prevent wake vortex encounters in various weather conditions. The system performs weather measurements, predicts bounds on wake vortex behavior in those conditions, derives safe wake spacing criteria, and validates the wake predictions with wake vortex measurements. System performance to date indicates that the potential runway arrival rate increase with Aircraft VOrtex Spacing System (AVOSS), considering common path effects and ATC delivery variance, is 5% to 12% depending on the ratio of large and heavy aircraft. The concept demonstration system, using early generation algorithms and minimal optimization, is performing the wake predictions with adequate robustness such that only 4 hard exceedances have been observed in 1235 wake validation cases. This performance demonstrates the feasibility of predicting wake behavior bounds with multiple uncertainties present, including the unknown aircraft weight and speed, weather persistence between the wake prediction and the observations, and the location of the weather sensors several kilometers from the approach location. A concept for the use of the AVOSS system for parallel runway operations has been suggested, and an initial study at the JFK International Airport suggests that a simplified AVOSS system can be successfully operated using only a single lidar as both the weather sensor and the wake validation instrument. Such a selfcontained AVOSS would be suitable for wake separation close to the airport, as is required for parallel approach concepts such as SOIA.
A Fusion Nuclear Science Facility for a fast-track path to DEMO
Garofalo, Andrea M.; Abdou, M.; Canik, John M.; ...
2014-10-01
An accelerated fusion energy development program, a “fast-track” approach, requires developing an understanding of fusion nuclear science (FNS) in parallel with research on ITER to study burning plasmas. A Fusion Nuclear Science Facility (FNSF) in parallel with ITER provides the capability to resolve FNS feasibility issues related to power extraction, tritium fuel sustainability, and reliability, and to begin construction of DEMO upon the achievement of Q~10 in ITER. Fusion nuclear components, including the first wall (FW)/blanket, divertor, heating/fueling systems, etc. are complex systems with many inter-related functions and different materials, fluids, and physical interfaces. These in-vessel nuclear components must operatemore » continuously and reliably with: (a) Plasma exposure, surface particle & radiation loads, (b) High energy 2 neutron fluxes and their interactions in materials (e.g. peaked volumetric heating with steep gradients, tritium production, activation, atomic displacements, gas production, etc.), (c) Strong magnetic fields with temporal and spatial variations (electromagnetic coupling to the plasma including off-normal events like disruptions), and (d) a High temperature, high vacuum, chemically active environment. While many of these conditions and effects are being studied with separate and multiple effect experimental test stands and modeling, fusion nuclear conditions cannot be completely simulated outside the fusion environment. This means there are many new multi-physics, multi-scale phenomena and synergistic effects yet to be discovered and accounted for in the understanding, design and operation of fusion as a self-sustaining, energy producing system, and significant experimentation and operational experience in a true fusion environment is an essential requirement. In the following sections we discuss the FNSF objectives, describe the facility requirements and a facility concept and operation approach that can accomplish those objectives, and assess the readiness to construct with respect to several key FNSF issues: materials, steady-state operation, disruptions, power exhaust, and breeding blanket. Finally we present our conclusions.« less
Talia, Adrian J; Coetzee, Cassandra; Tirosh, Oren; Tran, Phong
2018-01-08
Total hip arthroplasty is one of the most commonly performed surgical procedures worldwide. There are a number of surgical approaches for total hip arthroplasty and no high-level evidence supporting one approach over the other. Each approach has its unique benefits and drawbacks. This trial aims to directly compare the three most common surgical approaches for total hip arthroplasty. This is a single-centre study conducted at Western Health, Melbourne, Australia; a large metropolitan centre. It is a pragmatic, parallel three-arm, randomised controlled trial. Sample size will be 243 participants (81 in each group). Randomisation will be secure, web-based and managed by an independent statistician. Patients and research team will be blinded pre-operatively, but not post-operatively. Intervention will be either direct anterior, lateral or posterior approach for total hip arthroplasty, and the three arms will be directly compared. Participants will be aged over 18 years, able to provide informed consent and recruited from our outpatients. Patients who are having revision surgery or have indications for hip replacement other than osteoarthritis (i.e., fracture, malignancy, development dysplasia) will be excluded from the trial. The Oxford Hip Score will be determined for patients pre-operatively and 6 weeks, 6, 12 and 24 months post-operatively. The Oxford Hip Score at 24 months will be the primary outcome measure. Secondary outcome measures will be dislocation, infection, intraoperative and peri-prosthetic fracture rate, length of hospital stay and pain level, reported using a visual analogue scale. Many studies have evaluated approaches for total hip arthroplasty and arthroplasty registries worldwide are now collecting this data. However no study to date has compared these three common approaches directly in a randomised fashion. No trial has used patient-reported outcome measures to evaluate success. This pragmatic study aims to identify differences in patient perception of total hip arthroplasty depending on surgical approach. Australian New Zealand Clinical Trials Registry, ACTRN12617000272392 . Registered on 22 February 2017.
Optimal expression evaluation for data parallel architectures
NASA Technical Reports Server (NTRS)
Gilbert, John R.; Schreiber, Robert
1990-01-01
A data parallel machine represents an array or other composite data structure by allocating one processor (at least conceptually) per data item. A pointwise operation can be performed between two such arrays in unit time, provided their corresponding elements are allocated in the same processors. If the arrays are not aligned in this fashion, the cost of moving one or both of them is part of the cost of the operation. The choice of where to perform the operation then affects this cost. If an expression with several operands is to be evaluated, there may be many choices of where to perform the intermediate operations. An efficient algorithm is given to find the minimum-cost way to evaluate an expression, for several different data parallel architectures. This algorithm applies to any architecture in which the metric describing the cost of moving an array is robust. This encompasses most of the common data parallel communication architectures, including meshes of arbitrary dimension and hypercubes. Remarks are made on several variations of the problem, some of which are solved and some of which remain open.
Zeki, Semir
2016-10-01
Results from a variety of sources, some many years old, lead ineluctably to a re-appraisal of the twin strategies of hierarchical and parallel processing used by the brain to construct an image of the visual world. Contrary to common supposition, there are at least three 'feed-forward' anatomical hierarchies that reach the primary visual cortex (V1) and the specialized visual areas outside it, in parallel. These anatomical hierarchies do not conform to the temporal order with which visual signals reach the specialized visual areas through V1. Furthermore, neither the anatomical hierarchies nor the temporal order of activation through V1 predict the perceptual hierarchies. The latter shows that we see (and become aware of) different visual attributes at different times, with colour leading form (orientation) and directional visual motion, even though signals from fast-moving, high-contrast stimuli are among the earliest to reach the visual cortex (of area V5). Parallel processing, on the other hand, is much more ubiquitous than commonly supposed but is subject to a barely noticed but fundamental aspect of brain operations, namely that different parallel systems operate asynchronously with respect to each other and reach perceptual endpoints at different times. This re-assessment leads to the conclusion that the visual brain is constituted of multiple, parallel and asynchronously operating task- and stimulus-dependent hierarchies (STDH); which of these parallel anatomical hierarchies have temporal and perceptual precedence at any given moment is stimulus and task related, and dependent on the visual brain's ability to undertake multiple operations asynchronously. © 2016 Federation of European Neuroscience Societies and John Wiley & Sons Ltd.
Data for polarization in charmless B{yields}{phi}K*: A signal for new physics?
DOE Office of Scientific and Technical Information (OSTI.GOV)
Das, Prasanta Kumar; Yang, K.-C.
2005-05-01
The recent observations of sizable transverse fractions of B{yields}{phi}K* may hint for the existence of new physics. We analyze all possible new-physics four-quark operators and find that two classes of new-physics operators could offer resolutions to the B{yields}{phi}K* polarization anomaly. The operators in the first class have structures (1-{gamma}{sub 5})x(1-{gamma}{sub 5}), {sigma}(1-{gamma}{sub 5})x{sigma}(1-{gamma}{sub 5}), and in the second class (1+{gamma}{sub 5})x(1+{gamma}{sub 5}), {sigma}(1+{gamma}{sub 5})x{sigma}(1+{gamma}{sub 5}). For each class, the new-physics effects can be lumped into a single parameter. Two possible experimental results of polarization phases, arg(A{sub perpendicular})-arg(A{sub parallel}){approx_equal}{pi} or 0, originating from the phase ambiguity in data, could be separatelymore » accounted for by our two new-physics scenarios: the first (second) scenario with the first (second) class new-physics operators. The consistency between the data and our new-physics analysis suggests a small new-physics weak phase, together with a large(r) strong phase. We obtain sizable transverse fractions {lambda}{sub parallel{sub parallel}}+{lambda}{sub perpendicular{sub perpendicular}}{approx_equal}{lambda}{sub 00}, in accordance with the observations. We find {lambda}{sub parallel{sub parallel}}{approx_equal}0.8{lambda}{sub perpendicular{sub perpendicular}} in the first scenario but {lambda}{sub parallel{sub parallel}} > or approx. {lambda}{sub perpendicular{sub perpendicular}} in the second scenario. We discuss the impact of the new-physics weak phase on observations.« less
3D nonrigid registration via optimal mass transport on the GPU.
Ur Rehman, Tauseef; Haber, Eldad; Pryor, Gallagher; Melonakos, John; Tannenbaum, Allen
2009-12-01
In this paper, we present a new computationally efficient numerical scheme for the minimizing flow approach for optimal mass transport (OMT) with applications to non-rigid 3D image registration. The approach utilizes all of the gray-scale data in both images, and the optimal mapping from image A to image B is the inverse of the optimal mapping from B to A. Further, no landmarks need to be specified, and the minimizer of the distance functional involved is unique. Our implementation also employs multigrid, and parallel methodologies on a consumer graphics processing unit (GPU) for fast computation. Although computing the optimal map has been shown to be computationally expensive in the past, we show that our approach is orders of magnitude faster then previous work and is capable of finding transport maps with optimality measures (mean curl) previously unattainable by other works (which directly influences the accuracy of registration). We give results where the algorithm was used to compute non-rigid registrations of 3D synthetic data as well as intra-patient pre-operative and post-operative 3D brain MRI datasets.
NASA Astrophysics Data System (ADS)
Lanari, Riccardo; Bonano, Manuela; Buonanno, Sabatino; Casu, Francesco; De Luca, Claudio; Fusco, Adele; Manunta, Michele; Manzo, Mariarosaria; Pepe, Antonio; Zinno, Ivana
2017-04-01
The SENTINEL-1 (S1) mission is designed to provide operational capability for continuous mapping of the Earth thanks to its two polar-orbiting satellites (SENTINEL-1A and B) performing C-band synthetic aperture radar (SAR) imaging. It is, indeed, characterized by enhanced revisit frequency, coverage and reliability for operational services and applications requiring long SAR data time series. Moreover, SENTINEL-1 is specifically oriented to interferometry applications with stringent requirements based on attitude and orbit accuracy and it is intrinsically characterized by small spatial and temporal baselines. Consequently, SENTINEL-1 data are particularly suitable to be exploited through advanced interferometric techniques such as the well-known DInSAR algorithm referred to as Small BAseline Subset (SBAS), which allows the generation of deformation time series and displacement velocity maps. In this work we present an advanced interferometric processing chain, based on the Parallel SBAS (P-SBAS) approach, for the massive processing of S1 Interferometric Wide Swath (IWS) data aimed at generating deformation time series in efficient, automatic and systematic way. Such a DInSAR chain is designed to exploit distributed computing infrastructures, and more specifically Cloud Computing environments, to properly deal with the storage and the processing of huge S1 datasets. In particular, since S1 IWS data are acquired with the innovative Terrain Observation with Progressive Scans (TOPS) mode, we could benefit from the structure of S1 data, which are composed by bursts that can be considered as separate acquisitions. Indeed, the processing is intrinsically parallelizable with respect to such independent input data and therefore we basically exploited this coarse granularity parallelization strategy in the majority of the steps of the SBAS processing chain. Moreover, we also implemented more sophisticated parallelization approaches, exploiting both multi-node and multi-core programming techniques. Currently, Cloud Computing environments make available large collections of computing resources and storage that can be effectively exploited through the presented S1 P-SBAS processing chain to carry out interferometric analyses at a very large scale, in reduced time. This allows us to deal also with the problems connected to the use of S1 P-SBAS chain in operational contexts, related to hazard monitoring and risk prevention and mitigation, where handling large amounts of data represents a challenging task. As a significant experimental result we performed a large spatial scale SBAS analysis relevant to the Central and Southern Italy by exploiting the Amazon Web Services Cloud Computing platform. In particular, we processed in parallel 300 S1 acquisitions covering the Italian peninsula from Lazio to Sicily through the presented S1 P-SBAS processing chain, generating 710 interferograms, thus finally obtaining the displacement time series of the whole processed area. This work has been partially supported by the CNR-DPC agreement, the H2020 EPOS-IP project (GA 676564) and the ESA GEP project.
Feasibility of Decentralized Linear-Quadratic-Gaussian Control of Autonomous Distributed Spacecraft
NASA Technical Reports Server (NTRS)
Carpenter, J. Russell
1999-01-01
A distributed satellite formation, modeled as an arbitrary number of fully connected nodes in a network, could be controlled using a decentralized controller framework that distributes operations in parallel over the network. For such problems, a solution that minimizes data transmission requirements, in the context of linear-quadratic-Gaussian (LQG) control theory, was given by Speyer. This approach is advantageous because it is non-hierarchical, detected failures gracefully degrade system performance, fewer local computations are required than for a centralized controller, and it is optimal with respect to the standard LQG cost function. Disadvantages of the approach are the need for a fully connected communications network, the total operations performed over all the nodes are greater than for a centralized controller, and the approach is formulated for linear time-invariant systems. To investigate the feasibility of the decentralized approach to satellite formation flying, a simple centralized LQG design for a spacecraft orbit control problem is adapted to the decentralized framework. The simple design uses a fixed reference trajectory (an equatorial, Keplerian, circular orbit), and by appropriate choice of coordinates and measurements is formulated as a linear time-invariant system.
Parallel computing techniques for rotorcraft aerodynamics
NASA Astrophysics Data System (ADS)
Ekici, Kivanc
The modification of unsteady three-dimensional Navier-Stokes codes for application on massively parallel and distributed computing environments is investigated. The Euler/Navier-Stokes code TURNS (Transonic Unsteady Rotor Navier-Stokes) was chosen as a test bed because of its wide use by universities and industry. For the efficient implementation of TURNS on parallel computing systems, two algorithmic changes are developed. First, main modifications to the implicit operator, Lower-Upper Symmetric Gauss Seidel (LU-SGS) originally used in TURNS, is performed. Second, application of an inexact Newton method, coupled with a Krylov subspace iterative method (Newton-Krylov method) is carried out. Both techniques have been tried previously for the Euler equations mode of the code. In this work, we have extended the methods to the Navier-Stokes mode. Several new implicit operators were tried because of convergence problems of traditional operators with the high cell aspect ratio (CAR) grids needed for viscous calculations on structured grids. Promising results for both Euler and Navier-Stokes cases are presented for these operators. For the efficient implementation of Newton-Krylov methods to the Navier-Stokes mode of TURNS, efficient preconditioners must be used. The parallel implicit operators used in the previous step are employed as preconditioners and the results are compared. The Message Passing Interface (MPI) protocol has been used because of its portability to various parallel architectures. It should be noted that the proposed methodology is general and can be applied to several other CFD codes (e.g. OVERFLOW).
To build a mine: Prospect to product
NASA Technical Reports Server (NTRS)
Gertsch, Richard E.
1992-01-01
The terrestrial definition of ore is a quantity of earth materials containing a mineral that can be extracted at a profit. While a space-based resource-gathering operation may well be driven by other motives, such an operation should have the most favorable cost-benefit ratio possible. To this end, principles and procedures already tested by the stringent requirements of the profit motive should guide the selection, design, construction, and operation of a space-based mine. Proceeding from project initiation to a fully operational mine requires several interacting and overlapping steps, which are designed to facilitate the decision process and insure economic viability. The steps to achieve a fully operational mine are outlined. Presuming that the approach to developing nonterrestrial resources will parallel that for developing mineral resources on Earth, we can speculate on some of the problems associated with developing lunar and asteroidal resources. The baseline for our study group was a small lunar mine and oxygen extraction facility. The development of this facility is described in accordance with the steps outlined.
NASA Technical Reports Server (NTRS)
Wilson, T. G.
1980-01-01
The development of 5 kW converters with 100 kHz switching frequencies, consisting of two submodules each capable of 2.5 kW of output power, is discussed. Two semiconductor advances allowed increased power levels. Field effect transistors with ratings of 11 A and 400 V were operated in parallel to provide a converter output power of approximately 2000 W. Secondly, bipolar power switching transistor was operated in conjunction with a turn-off snubber circuit to provide converter output power levels approaching 1000 W. The interrelationships between mass, switching frequency, and efficiency were investigated. Converters were constructed for operation at a maximum output power level of 200 W, and a comparison was made for operation under similar input/output conditions for conversion frequencies of 20 kilohertz and 100 kilohertz. The effects of nondissipative turn-off snubber circuitry were also examined. Finally, a computerized instrumentation system allowing the measurement of pertinent converter operating conditions as well as the recording of converter waveforms is described.
Improving operating room productivity via parallel anesthesia processing.
Brown, Michael J; Subramanian, Arun; Curry, Timothy B; Kor, Daryl J; Moran, Steven L; Rohleder, Thomas R
2014-01-01
Parallel processing of regional anesthesia may improve operating room (OR) efficiency in patients undergoes upper extremity surgical procedures. The purpose of this paper is to evaluate whether performing regional anesthesia outside the OR in parallel increases total cases per day, improve efficiency and productivity. Data from all adult patients who underwent regional anesthesia as their primary anesthetic for upper extremity surgery over a one-year period were used to develop a simulation model. The model evaluated pure operating modes of regional anesthesia performed within and outside the OR in a parallel manner. The scenarios were used to evaluate how many surgeries could be completed in a standard work day (555 minutes) and assuming a standard three cases per day, what was the predicted end-of-day time overtime. Modeling results show that parallel processing of regional anesthesia increases the average cases per day for all surgeons included in the study. The average increase was 0.42 surgeries per day. Where it was assumed that three cases per day would be performed by all surgeons, the days going to overtime was reduced by 43 percent with parallel block. The overtime with parallel anesthesia was also projected to be 40 minutes less per day per surgeon. Key limitations include the assumption that all cases used regional anesthesia in the comparisons. Many days may have both regional and general anesthesia. Also, as a case study, single-center research may limit generalizability. Perioperative care providers should consider parallel administration of regional anesthesia where there is a desire to increase daily upper extremity surgical case capacity. Where there are sufficient resources to do parallel anesthesia processing, efficiency and productivity can be significantly improved. Simulation modeling can be an effective tool to show practice change effects at a system-wide level.
Temperature Control with Two Parallel Small Loop Heat Pipes for GLM Program
NASA Technical Reports Server (NTRS)
Khrustalev, Dmitry; Stouffer, Chuck; Ku, Jentung; Hamilton, Jon; Anderson, Mark
2014-01-01
The concept of temperature control of an electronic component using a single Loop Heat Pipe (LHP) is well established for Aerospace applications. Using two LHPs is often desirable for redundancy/reliability reasons or for increasing the overall heat source-sink thermal conductance. This effort elaborates on temperature controlling operation of a thermal system that includes two small ammonia LHPs thermally coupled together at the evaporator end as well as at the condenser end and operating "in parallel". A transient model of the LHP system was developed on the Thermal Desktop (TradeMark) platform to understand some fundamental details of such parallel operation of the two LHPs. Extensive thermal-vacuum testing was conducted with two thermally coupled LHPs operating simultaneously as well as with only one LHP operating at a time. This paper outlines the temperature control procedures for two LHPs operating simultaneously with widely varying sink temperatures. The test data obtained during the thermal-vacuum testing, with both LHPs running simultaneously in comparison with only one LHP operating at a time, are presented with detailed explanations.
Runtime optimization of an application executing on a parallel computer
None
2014-11-25
Identifying a collective operation within an application executing on a parallel computer; identifying a call site of the collective operation; determining whether the collective operation is root-based; if the collective operation is not root-based: establishing a tuning session and executing the collective operation in the tuning session; if the collective operation is root-based, determining whether all compute nodes executing the application identified the collective operation at the same call site; if all compute nodes identified the collective operation at the same call site, establishing a tuning session and executing the collective operation in the tuning session; and if all compute nodes executing the application did not identify the collective operation at the same call site, executing the collective operation without establishing a tuning session.
Runtime optimization of an application executing on a parallel computer
Faraj, Daniel A; Smith, Brian E
2014-11-18
Identifying a collective operation within an application executing on a parallel computer; identifying a call site of the collective operation; determining whether the collective operation is root-based; if the collective operation is not root-based: establishing a tuning session and executing the collective operation in the tuning session; if the collective operation is root-based, determining whether all compute nodes executing the application identified the collective operation at the same call site; if all compute nodes identified the collective operation at the same call site, establishing a tuning session and executing the collective operation in the tuning session; and if all compute nodes executing the application did not identify the collective operation at the same call site, executing the collective operation without establishing a tuning session.
Runtime optimization of an application executing on a parallel computer
Faraj, Daniel A.; Smith, Brian E.
2013-01-29
Identifying a collective operation within an application executing on a parallel computer; identifying a call site of the collective operation; determining whether the collective operation is root-based; if the collective operation is not root-based: establishing a tuning session and executing the collective operation in the tuning session; if the collective operation is root-based, determining whether all compute nodes executing the application identified the collective operation at the same call site; if all compute nodes identified the collective operation at the same call site, establishing a tuning session and executing the collective operation in the tuning session; and if all compute nodes executing the application did not identify the collective operation at the same call site, executing the collective operation without establishing a tuning session.
PRAIS: Distributed, real-time knowledge-based systems made easy
NASA Technical Reports Server (NTRS)
Goldstein, David G.
1990-01-01
This paper discusses an architecture for real-time, distributed (parallel) knowledge-based systems called the Parallel Real-time Artificial Intelligence System (PRAIS). PRAIS strives for transparently parallelizing production (rule-based) systems, even when under real-time constraints. PRAIS accomplishes these goals by incorporating a dynamic task scheduler, operating system extensions for fact handling, and message-passing among multiple copies of CLIPS executing on a virtual blackboard. This distributed knowledge-based system tool uses the portability of CLIPS and common message-passing protocols to operate over a heterogeneous network of processors.
Air Traffic and Operational Data on Selected US Airports with Parallel Runways
NASA Technical Reports Server (NTRS)
Doyle, Thomas M.; McGee, Frank G.
1998-01-01
This report presents information on a number of airports in the country with parallel runways and focuses on those that have at least one pair of parallel runways closer than 4300 ft. Information contained in the report describes the airport's current operational activity as obtained through contact with the facility and from FAA air traffic tower activity data for FY 1997. The primary reason for this document is to provide a single source of information for research to determine airports where Airborne Information for Lateral Spacing (AILS) technology may be applicable.
Parallel fabrication of macroporous scaffolds.
Dobos, Andrew; Grandhi, Taraka Sai Pavan; Godeshala, Sudhakar; Meldrum, Deirdre R; Rege, Kaushal
2018-07-01
Scaffolds generated from naturally occurring and synthetic polymers have been investigated in several applications because of their biocompatibility and tunable chemo-mechanical properties. Existing methods for generation of 3D polymeric scaffolds typically cannot be parallelized, suffer from low throughputs, and do not allow for quick and easy removal of the fragile structures that are formed. Current molds used in hydrogel and scaffold fabrication using solvent casting and porogen leaching are often single-use and do not facilitate 3D scaffold formation in parallel. Here, we describe a simple device and related approaches for the parallel fabrication of macroporous scaffolds. This approach was employed for the generation of macroporous and non-macroporous materials in parallel, in higher throughput and allowed for easy retrieval of these 3D scaffolds once formed. In addition, macroporous scaffolds with interconnected as well as non-interconnected pores were generated, and the versatility of this approach was employed for the generation of 3D scaffolds from diverse materials including an aminoglycoside-derived cationic hydrogel ("Amikagel"), poly(lactic-co-glycolic acid) or PLGA, and collagen. Macroporous scaffolds generated using the device were investigated for plasmid DNA binding and cell loading, indicating the use of this approach for developing materials for different applications in biotechnology. Our results demonstrate that the device-based approach is a simple technology for generating scaffolds in parallel, which can enhance the toolbox of current fabrication techniques. © 2018 Wiley Periodicals, Inc.
NASA Technical Reports Server (NTRS)
Abbott, Terence S.
2011-01-01
This paper presents an overview of an algorithm specifically designed to support NASA's Airborne Precision Spacing concept. This airborne self-spacing concept is trajectory-based, allowing for spacing operations prior to the aircraft being on a common path. This implementation provides the ability to manage spacing against two traffic aircraft, with one of these aircraft operating to a parallel dependent runway. Because this algorithm is trajectory-based, it also has the inherent ability to support required-time-of-arrival (RTA) operations
NASA Astrophysics Data System (ADS)
Hofierka, Jaroslav; Lacko, Michal; Zubal, Stanislav
2017-10-01
In this paper, we describe the parallelization of three complex and computationally intensive modules of GRASS GIS using the OpenMP application programming interface for multi-core computers. These include the v.surf.rst module for spatial interpolation, the r.sun module for solar radiation modeling and the r.sim.water module for water flow simulation. We briefly describe the functionality of the modules and parallelization approaches used in the modules. Our approach includes the analysis of the module's functionality, identification of source code segments suitable for parallelization and proper application of OpenMP parallelization code to create efficient threads processing the subtasks. We document the efficiency of the solutions using the airborne laser scanning data representing land surface in the test area and derived high-resolution digital terrain model grids. We discuss the performance speed-up and parallelization efficiency depending on the number of processor threads. The study showed a substantial increase in computation speeds on a standard multi-core computer while maintaining the accuracy of results in comparison to the output from original modules. The presented parallelization approach showed the simplicity and efficiency of the parallelization of open-source GRASS GIS modules using OpenMP, leading to an increased performance of this geospatial software on standard multi-core computers.
Modelling parallel programs and multiprocessor architectures with AXE
NASA Technical Reports Server (NTRS)
Yan, Jerry C.; Fineman, Charles E.
1991-01-01
AXE, An Experimental Environment for Parallel Systems, was designed to model and simulate for parallel systems at the process level. It provides an integrated environment for specifying computation models, multiprocessor architectures, data collection, and performance visualization. AXE is being used at NASA-Ames for developing resource management strategies, parallel problem formulation, multiprocessor architectures, and operating system issues related to the High Performance Computing and Communications Program. AXE's simple, structured user-interface enables the user to model parallel programs and machines precisely and efficiently. Its quick turn-around time keeps the user interested and productive. AXE models multicomputers. The user may easily modify various architectural parameters including the number of sites, connection topologies, and overhead for operating system activities. Parallel computations in AXE are represented as collections of autonomous computing objects known as players. Their use and behavior is described. Performance data of the multiprocessor model can be observed on a color screen. These include CPU and message routing bottlenecks, and the dynamic status of the software.
Dong, Yu-Shuang; Xu, Gao-Chao; Fu, Xiao-Dong
2014-01-01
The cloud platform provides various services to users. More and more cloud centers provide infrastructure as the main way of operating. To improve the utilization rate of the cloud center and to decrease the operating cost, the cloud center provides services according to requirements of users by sharding the resources with virtualization. Considering both QoS for users and cost saving for cloud computing providers, we try to maximize performance and minimize energy cost as well. In this paper, we propose a distributed parallel genetic algorithm (DPGA) of placement strategy for virtual machines deployment on cloud platform. It executes the genetic algorithm parallelly and distributedly on several selected physical hosts in the first stage. Then it continues to execute the genetic algorithm of the second stage with solutions obtained from the first stage as the initial population. The solution calculated by the genetic algorithm of the second stage is the optimal one of the proposed approach. The experimental results show that the proposed placement strategy of VM deployment can ensure QoS for users and it is more effective and more energy efficient than other placement strategies on the cloud platform. PMID:25097872
Dong, Yu-Shuang; Xu, Gao-Chao; Fu, Xiao-Dong
2014-01-01
The cloud platform provides various services to users. More and more cloud centers provide infrastructure as the main way of operating. To improve the utilization rate of the cloud center and to decrease the operating cost, the cloud center provides services according to requirements of users by sharding the resources with virtualization. Considering both QoS for users and cost saving for cloud computing providers, we try to maximize performance and minimize energy cost as well. In this paper, we propose a distributed parallel genetic algorithm (DPGA) of placement strategy for virtual machines deployment on cloud platform. It executes the genetic algorithm parallelly and distributedly on several selected physical hosts in the first stage. Then it continues to execute the genetic algorithm of the second stage with solutions obtained from the first stage as the initial population. The solution calculated by the genetic algorithm of the second stage is the optimal one of the proposed approach. The experimental results show that the proposed placement strategy of VM deployment can ensure QoS for users and it is more effective and more energy efficient than other placement strategies on the cloud platform.
Toward Petascale Biologically Plausible Neural Networks
NASA Astrophysics Data System (ADS)
Long, Lyle
This talk will describe an approach to achieving petascale neural networks. Artificial intelligence has been oversold for many decades. Computers in the beginning could only do about 16,000 operations per second. Computer processing power, however, has been doubling every two years thanks to Moore's law, and growing even faster due to massively parallel architectures. Finally, 60 years after the first AI conference we have computers on the order of the performance of the human brain (1016 operations per second). The main issues now are algorithms, software, and learning. We have excellent models of neurons, such as the Hodgkin-Huxley model, but we do not know how the human neurons are wired together. With careful attention to efficient parallel computing, event-driven programming, table lookups, and memory minimization massive scale simulations can be performed. The code that will be described was written in C + + and uses the Message Passing Interface (MPI). It uses the full Hodgkin-Huxley neuron model, not a simplified model. It also allows arbitrary network structures (deep, recurrent, convolutional, all-to-all, etc.). The code is scalable, and has, so far, been tested on up to 2,048 processor cores using 107 neurons and 109 synapses.
Electromagnetic Design of a Magnetically Coupled Spatial Power Combiner
NASA Astrophysics Data System (ADS)
Bulcha, B. T.; Cataldo, G.; Stevenson, T. R.; U-Yen, K.; Moseley, S. H.; Wollack, E. J.
2018-04-01
The design of a two-dimensional spatial beam-combining network employing a parallel-plate superconducting waveguide filled with a monocrystalline silicon dielectric substrate is presented. This component uses arrays of magnetically coupled antenna elements to achieve high coupling efficiency and full sampling of the intensity distribution while avoiding diffractive losses in the multimode waveguide region. These attributes enable the structure's use in realizing compact far-infrared spectrometers for astrophysical and instrumentation applications. If unterminated, reflections within a finite-sized spatial beam combiner can potentially lead to spurious couplings between elements. A planar meta-material electromagnetic absorber is implemented to control this response within the device. This broadband termination absorbs greater than 0.99 of the power over the 1.7:1 operational band at angles ranging from normal to near-parallel incidence. The design approach, simulations and applications of the spatial power combiner and meta-material termination structure are presented.
Fast encryption of RGB color digital images using a tweakable cellular automaton based schema
NASA Astrophysics Data System (ADS)
Faraoun, Kamel Mohamed
2014-12-01
We propose a new tweakable construction of block-enciphers using second-order reversible cellular automata, and we apply it to encipher RGB-colored images. The proposed construction permits a parallel encryption of the image content by extending the standard definition of a block cipher to take into account a supplementary parameter used as a tweak (nonce) to control the behavior of the cipher from one region of the image to the other, and hence avoid the necessity to use slow sequential encryption's operating modes. The proposed construction defines a flexible pseudorandom permutation that can be used with efficacy to solve the electronic code book problem without the need to a specific sequential mode. Obtained results from various experiments show that the proposed schema achieves high security and execution performances, and enables an interesting mode of selective area decryption due to the parallel character of the approach.
NASA Astrophysics Data System (ADS)
Piras, Paolo; Torromeo, Concetta; Re, Federica; Evangelista, Antonietta; Gabriele, Stefano; Esposito, Giuseppe; Nardinocchi, Paola; Teresi, Luciano; Madeo, Andrea; Chialastri, Claudia; Schiariti, Michele; Varano, Valerio; Uguccioni, Massimo; Puddu, Paolo E.
2016-10-01
The analysis of full Left Atrium (LA) deformation and whole LA deformational trajectory in time has been poorly investigated and, to the best of our knowledge, seldom discussed in patients with Hypertrophic Cardiomyopathy. Therefore, we considered 22 patients with Hypertrophic Cardiomyopathy (HCM) and 46 healthy subjects, investigated them by three-dimensional Speckle Tracking Echocardiography, and studied the derived landmark clouds via Geometric Morphometrics with Parallel Transport. Trajectory shape and trajectory size were different in Controls versus HCM and their classification powers had high AUC (Area Under the Receiving Operator Characteristic Curve) and accuracy. The two trajectories were much different at the transition between LA conduit and booster pump functions. Full shape and deformation analyses with trajectory analysis enabled a straightforward perception of pathophysiological consequences of HCM condition on LA functioning. It might be worthwhile to apply these techniques to look for novel pathophysiological approaches that may better define atrio-ventricular interaction.
Temporal Precedence Checking for Switched Models and its Application to a Parallel Landing Protocol
NASA Technical Reports Server (NTRS)
Duggirala, Parasara Sridhar; Wang, Le; Mitra, Sayan; Viswanathan, Mahesh; Munoz, Cesar A.
2014-01-01
This paper presents an algorithm for checking temporal precedence properties of nonlinear switched systems. This class of properties subsume bounded safety and capture requirements about visiting a sequence of predicates within given time intervals. The algorithm handles nonlinear predicates that arise from dynamics-based predictions used in alerting protocols for state-of-the-art transportation systems. It is sound and complete for nonlinear switch systems that robustly satisfy the given property. The algorithm is implemented in the Compare Execute Check Engine (C2E2) using validated simulations. As a case study, a simplified model of an alerting system for closely spaced parallel runways is considered. The proposed approach is applied to this model to check safety properties of the alerting logic for different operating conditions such as initial velocities, bank angles, aircraft longitudinal separation, and runway separation.
NASA Technical Reports Server (NTRS)
Tesch, W. A.; Steenken, W. G.
1976-01-01
The results are presented of a one-dimensional dynamic digital blade row compressor model study of a J85-13 engine operating with uniform and with circumferentially distorted inlet flow. Details of the geometry and the derived blade row characteristics used to simulate the clean inlet performance are given. A stability criterion based upon the self developing unsteady internal flows near surge provided an accurate determination of the clean inlet surge line. The basic model was modified to include an arbitrary extent multi-sector parallel compressor configuration for investigating 180 deg 1/rev total pressure, total temperature, and combined total pressure and total temperature distortions. The combined distortions included opposed, coincident, and 90 deg overlapped patterns. The predicted losses in surge pressure ratio matched the measured data trends at all speeds and gave accurate predictions at high corrected speeds where the slope of the speed lines approached the vertical.
An Analytical Time–Domain Expression for the Net Ripple Produced by Parallel Interleaved Converters
DOE Office of Scientific and Technical Information (OSTI.GOV)
Johnson, Brian B.; Krein, Philip T.
We apply modular arithmetic and Fourier series to analyze the superposition of N interleaved triangular waveforms with identical amplitudes and duty-ratios. Here, interleaving refers to the condition when a collection of periodic waveforms with identical periods are each uniformly phase-shifted across one period. The main result is a time-domain expression which provides an exact representation of the summed and interleaved triangular waveforms, where the peak amplitude and parameters of the time-periodic component are all specified in closed-form. Analysis is general and can be used to study various applications in multi-converter systems. This model is unique not only in that itmore » reveals a simple and intuitive expression for the net ripple, but its derivation via modular arithmetic and Fourier series is distinct from prior approaches. The analytical framework is experimentally validated with a system of three parallel converters under time-varying operating conditions.« less
Domain Decomposition By the Advancing-Partition Method
NASA Technical Reports Server (NTRS)
Pirzadeh, Shahyar Z.
2008-01-01
A new method of domain decomposition has been developed for generating unstructured grids in subdomains either sequentially or using multiple computers in parallel. Domain decomposition is a crucial and challenging step for parallel grid generation. Prior methods are generally based on auxiliary, complex, and computationally intensive operations for defining partition interfaces and usually produce grids of lower quality than those generated in single domains. The new technique, referred to as "Advancing Partition," is based on the Advancing-Front method, which partitions a domain as part of the volume mesh generation in a consistent and "natural" way. The benefits of this approach are: 1) the process of domain decomposition is highly automated, 2) partitioning of domain does not compromise the quality of the generated grids, and 3) the computational overhead for domain decomposition is minimal. The new method has been implemented in NASA's unstructured grid generation code VGRID.
Comparison of Procedures for Dual and Triple Closely Spaced Parallel Runways
NASA Technical Reports Server (NTRS)
Verma, Savita; Ballinger, Deborah; Subramanian Shobana; Kozon, Thomas
2012-01-01
A human-in-the-loop high fidelity flight simulation experiment was conducted, which investigated and compared breakout procedures for Very Closely Spaced Parallel Approaches (VCSPA) with two and three runways. To understand the feasibility, usability and human factors of two and three runway VCSPA, data were collected and analyzed on the dependent variables of breakout cross track error and pilot workload. Independent variables included number of runways, cause of breakout and location of breakout. Results indicated larger cross track error and higher workload using three runways as compared to 2-runway operations. Significant interaction effects involving breakout cause and breakout location were also observed. Across all conditions, cross track error values showed high levels of breakout trajectory accuracy and pilot workload remained manageable. Results suggest possible avenues of future adaptation for adopting these procedures (e.g., pilot training), while also showing potential promise of the concept.
NASA Technical Reports Server (NTRS)
Tilton, James C.
1988-01-01
Image segmentation can be a key step in data compression and image analysis. However, the segmentation results produced by most previous approaches to region growing are suspect because they depend on the order in which portions of the image are processed. An iterative parallel segmentation algorithm avoids this problem by performing globally best merges first. Such a segmentation approach, and two implementations of the approach on NASA's Massively Parallel Processor (MPP) are described. Application of the segmentation approach to data compression and image analysis is then described, and results of such application are given for a LANDSAT Thematic Mapper image.
Architecture Adaptive Computing Environment
NASA Technical Reports Server (NTRS)
Dorband, John E.
2006-01-01
Architecture Adaptive Computing Environment (aCe) is a software system that includes a language, compiler, and run-time library for parallel computing. aCe was developed to enable programmers to write programs, more easily than was previously possible, for a variety of parallel computing architectures. Heretofore, it has been perceived to be difficult to write parallel programs for parallel computers and more difficult to port the programs to different parallel computing architectures. In contrast, aCe is supportable on all high-performance computing architectures. Currently, it is supported on LINUX clusters. aCe uses parallel programming constructs that facilitate writing of parallel programs. Such constructs were used in single-instruction/multiple-data (SIMD) programming languages of the 1980s, including Parallel Pascal, Parallel Forth, C*, *LISP, and MasPar MPL. In aCe, these constructs are extended and implemented for both SIMD and multiple- instruction/multiple-data (MIMD) architectures. Two new constructs incorporated in aCe are those of (1) scalar and virtual variables and (2) pre-computed paths. The scalar-and-virtual-variables construct increases flexibility in optimizing memory utilization in various architectures. The pre-computed-paths construct enables the compiler to pre-compute part of a communication operation once, rather than computing it every time the communication operation is performed.
Automatic Management of Parallel and Distributed System Resources
NASA Technical Reports Server (NTRS)
Yan, Jerry; Ngai, Tin Fook; Lundstrom, Stephen F.
1990-01-01
Viewgraphs on automatic management of parallel and distributed system resources are presented. Topics covered include: parallel applications; intelligent management of multiprocessing systems; performance evaluation of parallel architecture; dynamic concurrent programs; compiler-directed system approach; lattice gaseous cellular automata; and sparse matrix Cholesky factorization.
NASA Technical Reports Server (NTRS)
Pritchett, Amy R.; Hansman, R. John
1997-01-01
Efforts to increase airport capacity include studies of aircraft systems that would enable simultaneous approaches to closely spaced parallel runway in Instrument Meteorological Conditions (IMC). The time-critical nature of a parallel approach results in key design issues for current and future collision avoidance systems. Two part-task flight simulator studies have examined the procedural and display issues inherent in such a time-critical task, the interaction of the pilot with a collision avoidance system, and the alerting criteria and avoidance maneuvers preferred by subjects.
Approximation algorithms for scheduling unrelated parallel machines with release dates
NASA Astrophysics Data System (ADS)
Avdeenko, T. V.; Mesentsev, Y. A.; Estraykh, I. V.
2017-01-01
In this paper we propose approaches to optimal scheduling of unrelated parallel machines with release dates. One approach is based on the scheme of dynamic programming modified with adaptive narrowing of search domain ensuring its computational effectiveness. We discussed complexity of the exact schedules synthesis and compared it with approximate, close to optimal, solutions. Also we explain how the algorithm works for the example of two unrelated parallel machines and five jobs with release dates. Performance results that show the efficiency of the proposed approach have been given.
DOE Office of Scientific and Technical Information (OSTI.GOV)
HUMPHREYS, D C
A parallel readiness assessment (RA) was conducted by independent Fluor Hanford (FH) and U. S. Department of Energy, Richland Operations Office (RL) team to verify that an adequate state of readiness had been achieved for activities associated with the packaging and shipping of pressurized water reactor fuel assemblies from B-Cell in the 324 Building to the interim storage area at the Canister Storage Building in the 200 Area. The RL review was conducted in parallel with the FH review in accordance with the Joint RL/FH Implementation Plan (Appendix B). The RL RA Team members were assigned a FH RA Teammore » counterpart for the review. With this one-on-one approach, the RL RA Team was able to assess the FH Team's performance, competence, and adherence to the implementation plan and evaluate the level of facility readiness. The RL RA Team agrees with the FH determination that startup of the 324 Building B-Cell pressurized water reactor spent nuclear fuel packaging and shipping operations can safely proceed, pending completion of the identified pre-start items in the FH final report (see Appendix A), completion of the manageable list of open items included in the facility's declaration of readiness, and execution of the startup plan to operations.« less
Comparison of candidate solar array maximum power utilization approaches. [for spacecraft propulsion
NASA Technical Reports Server (NTRS)
Costogue, E. N.; Lindena, S.
1976-01-01
A study was made of five potential approaches that can be utilized to detect the maximum power point of a solar array while sustaining operations at or near maximum power and without endangering stability or causing array voltage collapse. The approaches studied included: (1) dynamic impedance comparator, (2) reference array measurement, (3) onset of solar array voltage collapse detection, (4) parallel tracker, and (5) direct measurement. The study analyzed the feasibility and adaptability of these approaches to a future solar electric propulsion (SEP) mission, and, specifically, to a comet rendezvous mission. Such missions presented the most challenging requirements to a spacecraft power subsystem in terms of power management over large solar intensity ranges of 1.0 to 3.5 AU. The dynamic impedance approach was found to have the highest figure of merit, and the reference array approach followed closely behind. The results are applicable to terrestrial solar power systems as well as to other than SEP space missions.
Boyle, Peter A.; Christ, Norman H.; Gara, Alan; Mawhinney, Robert D.; Ohmacht, Martin; Sugavanam, Krishnan
2012-12-11
A prefetch system improves a performance of a parallel computing system. The parallel computing system includes a plurality of computing nodes. A computing node includes at least one processor and at least one memory device. The prefetch system includes at least one stream prefetch engine and at least one list prefetch engine. The prefetch system operates those engines simultaneously. After the at least one processor issues a command, the prefetch system passes the command to a stream prefetch engine and a list prefetch engine. The prefetch system operates the stream prefetch engine and the list prefetch engine to prefetch data to be needed in subsequent clock cycles in the processor in response to the passed command.
Parallel computing for probabilistic fatigue analysis
NASA Technical Reports Server (NTRS)
Sues, Robert H.; Lua, Yuan J.; Smith, Mark D.
1993-01-01
This paper presents the results of Phase I research to investigate the most effective parallel processing software strategies and hardware configurations for probabilistic structural analysis. We investigate the efficiency of both shared and distributed-memory architectures via a probabilistic fatigue life analysis problem. We also present a parallel programming approach, the virtual shared-memory paradigm, that is applicable across both types of hardware. Using this approach, problems can be solved on a variety of parallel configurations, including networks of single or multiprocessor workstations. We conclude that it is possible to effectively parallelize probabilistic fatigue analysis codes; however, special strategies will be needed to achieve large-scale parallelism to keep large number of processors busy and to treat problems with the large memory requirements encountered in practice. We also conclude that distributed-memory architecture is preferable to shared-memory for achieving large scale parallelism; however, in the future, the currently emerging hybrid-memory architectures will likely be optimal.
Distributed computing feasibility in a non-dedicated homogeneous distributed system
NASA Technical Reports Server (NTRS)
Leutenegger, Scott T.; Sun, Xian-He
1993-01-01
The low cost and availability of clusters of workstations have lead researchers to re-explore distributed computing using independent workstations. This approach may provide better cost/performance than tightly coupled multiprocessors. In practice, this approach often utilizes wasted cycles to run parallel jobs. The feasibility of such a non-dedicated parallel processing environment assuming workstation processes have preemptive priority over parallel tasks is addressed. An analytical model is developed to predict parallel job response times. Our model provides insight into how significantly workstation owner interference degrades parallel program performance. A new term task ratio, which relates the parallel task demand to the mean service demand of nonparallel workstation processes, is introduced. It was proposed that task ratio is a useful metric for determining how large the demand of a parallel applications must be in order to make efficient use of a non-dedicated distributed system.
A Tutorial on Parallel and Concurrent Programming in Haskell
NASA Astrophysics Data System (ADS)
Peyton Jones, Simon; Singh, Satnam
This practical tutorial introduces the features available in Haskell for writing parallel and concurrent programs. We first describe how to write semi-explicit parallel programs by using annotations to express opportunities for parallelism and to help control the granularity of parallelism for effective execution on modern operating systems and processors. We then describe the mechanisms provided by Haskell for writing explicitly parallel programs with a focus on the use of software transactional memory to help share information between threads. Finally, we show how nested data parallelism can be used to write deterministically parallel programs which allows programmers to use rich data types in data parallel programs which are automatically transformed into flat data parallel versions for efficient execution on multi-core processors.
Ferrucci, Filomena; Salza, Pasquale; Sarro, Federica
2017-06-29
The need to improve the scalability of Genetic Algorithms (GAs) has motivated the research on Parallel Genetic Algorithms (PGAs), and different technologies and approaches have been used. Hadoop MapReduce represents one of the most mature technologies to develop parallel algorithms. Based on the fact that parallel algorithms introduce communication overhead, the aim of the present work is to understand if, and possibly when, the parallel GAs solutions using Hadoop MapReduce show better performance than sequential versions in terms of execution time. Moreover, we are interested in understanding which PGA model can be most effective among the global, grid, and island models. We empirically assessed the performance of these three parallel models with respect to a sequential GA on a software engineering problem, evaluating the execution time and the achieved speedup. We also analysed the behaviour of the parallel models in relation to the overhead produced by the use of Hadoop MapReduce and the GAs' computational effort, which gives a more machine-independent measure of these algorithms. We exploited three problem instances to differentiate the computation load and three cluster configurations based on 2, 4, and 8 parallel nodes. Moreover, we estimated the costs of the execution of the experimentation on a potential cloud infrastructure, based on the pricing of the major commercial cloud providers. The empirical study revealed that the use of PGA based on the island model outperforms the other parallel models and the sequential GA for all the considered instances and clusters. Using 2, 4, and 8 nodes, the island model achieves an average speedup over the three datasets of 1.8, 3.4, and 7.0 times, respectively. Hadoop MapReduce has a set of different constraints that need to be considered during the design and the implementation of parallel algorithms. The overhead of data store (i.e., HDFS) accesses, communication, and latency requires solutions that reduce data store operations. For this reason, the island model is more suitable for PGAs than the global and grid model, also in terms of costs when executed on a commercial cloud provider.
3D Data Denoising via Nonlocal Means Filter by Using Parallel GPU Strategies
Cuomo, Salvatore; De Michele, Pasquale; Piccialli, Francesco
2014-01-01
Nonlocal Means (NLM) algorithm is widely considered as a state-of-the-art denoising filter in many research fields. Its high computational complexity leads researchers to the development of parallel programming approaches and the use of massively parallel architectures such as the GPUs. In the recent years, the GPU devices had led to achieving reasonable running times by filtering, slice-by-slice, and 3D datasets with a 2D NLM algorithm. In our approach we design and implement a fully 3D NonLocal Means parallel approach, adopting different algorithm mapping strategies on GPU architecture and multi-GPU framework, in order to demonstrate its high applicability and scalability. The experimental results we obtained encourage the usability of our approach in a large spectrum of applicative scenarios such as magnetic resonance imaging (MRI) or video sequence denoising. PMID:25045397
Parallel Implementation of the Discontinuous Galerkin Method
NASA Technical Reports Server (NTRS)
Baggag, Abdalkader; Atkins, Harold; Keyes, David
1999-01-01
This paper describes a parallel implementation of the discontinuous Galerkin method. Discontinuous Galerkin is a spatially compact method that retains its accuracy and robustness on non-smooth unstructured grids and is well suited for time dependent simulations. Several parallelization approaches are studied and evaluated. The most natural and symmetric of the approaches has been implemented in all object-oriented code used to simulate aeroacoustic scattering. The parallel implementation is MPI-based and has been tested on various parallel platforms such as the SGI Origin, IBM SP2, and clusters of SGI and Sun workstations. The scalability results presented for the SGI Origin show slightly superlinear speedup on a fixed-size problem due to cache effects.
NASA Technical Reports Server (NTRS)
Fijany, Amir (Inventor); Bejczy, Antal K. (Inventor)
1993-01-01
This is a real-time robotic controller and simulator which is a MIMD-SIMD parallel architecture for interfacing with an external host computer and providing a high degree of parallelism in computations for robotic control and simulation. It includes a host processor for receiving instructions from the external host computer and for transmitting answers to the external host computer. There are a plurality of SIMD microprocessors, each SIMD processor being a SIMD parallel processor capable of exploiting fine grain parallelism and further being able to operate asynchronously to form a MIMD architecture. Each SIMD processor comprises a SIMD architecture capable of performing two matrix-vector operations in parallel while fully exploiting parallelism in each operation. There is a system bus connecting the host processor to the plurality of SIMD microprocessors and a common clock providing a continuous sequence of clock pulses. There is also a ring structure interconnecting the plurality of SIMD microprocessors and connected to the clock for providing the clock pulses to the SIMD microprocessors and for providing a path for the flow of data and instructions between the SIMD microprocessors. The host processor includes logic for controlling the RRCS by interpreting instructions sent by the external host computer, decomposing the instructions into a series of computations to be performed by the SIMD microprocessors, using the system bus to distribute associated data among the SIMD microprocessors, and initiating activity of the SIMD microprocessors to perform the computations on the data by procedure call.
Design of a dataway processor for a parallel image signal processing system
NASA Astrophysics Data System (ADS)
Nomura, Mitsuru; Fujii, Tetsuro; Ono, Sadayasu
1995-04-01
Recently, demands for high-speed signal processing have been increasing especially in the field of image data compression, computer graphics, and medical imaging. To achieve sufficient power for real-time image processing, we have been developing parallel signal-processing systems. This paper describes a communication processor called 'dataway processor' designed for a new scalable parallel signal-processing system. The processor has six high-speed communication links (Dataways), a data-packet routing controller, a RISC CORE, and a DMA controller. Each communication link operates at 8-bit parallel in a full duplex mode at 50 MHz. Moreover, data routing, DMA, and CORE operations are processed in parallel. Therefore, sufficient throughput is available for high-speed digital video signals. The processor is designed in a top- down fashion using a CAD system called 'PARTHENON.' The hardware is fabricated using 0.5-micrometers CMOS technology, and its hardware is about 200 K gates.
NASA Technical Reports Server (NTRS)
Halpert, G.; Webb, D. A.
1983-01-01
Three batteries were operated in parallel from a common bus during charge and discharge. SMM utilized NASA Standard 20AH cells and batteries, and LANDSAT-D NASA 50AH cells and batteries of a similar design. Each battery consisted of 22 series connected cells providing the nominal 28V bus. The three batteries were charged in parallel using the voltage limit/current taper mode wherein the voltage limit was temperature compensated. Discharge occurred on the demand of the spacecraft instruments and electronics. Both flights were planned for three to five year missions. The series/parallel configuration of cells and batteries for the 3-5 yr mission required a well controlled product with built-in reliability and uniformity. Examples of how component, cell and battery selection methods affect the uniformity of the series/parallel operation of the batteries both in testing and in flight are given.
Yip, Hon Ming; Li, John C. S.; Cui, Xin; Gao, Qiannan; Leung, Chi Chiu
2014-01-01
As microfluidics has been applied extensively in many cell and biochemical applications, monitoring the related processes is an important requirement. In this work, we design and fabricate a high-throughput microfluidic device which contains 32 microchambers to perform automated parallel microfluidic operations and monitoring on an automated stage of a microscope. Images are captured at multiple spots on the device during the operations for monitoring samples in microchambers in parallel; yet the device positions may vary at different time points throughout operations as the device moves back and forth on a motorized microscopic stage. Here, we report an image-based positioning strategy to realign the chamber position before every recording of microscopic image. We fabricate alignment marks at defined locations next to the chambers in the microfluidic device as reference positions. We also develop image processing algorithms to recognize the chamber positions in real-time, followed by realigning the chambers to their preset positions in the captured images. We perform experiments to validate and characterize the device functionality and the automated realignment operation. Together, this microfluidic realignment strategy can be a platform technology to achieve precise positioning of multiple chambers for general microfluidic applications requiring long-term parallel monitoring of cell and biochemical activities. PMID:25133248
Simple and powerful visual stimulus generator.
Kremlácek, J; Kuba, M; Kubová, Z; Vít, F
1999-02-01
We describe a cheap, simple, portable and efficient approach to visual stimulation for neurophysiology which does not need any special hardware equipment. The method based on an animation technique uses the FLI autodesk animator format. This form of the animation is replayed by a special program ('player') providing synchronisation pulses toward recording system via parallel port. The 'player is running on an IBM compatible personal computer under MS-DOS operation system and stimulus is displayed on a VGA computer monitor. Various stimuli created with this technique for visual evoked potentials (VEPs) are presented.
Traffic Flow Management and Optimization
NASA Technical Reports Server (NTRS)
Rios, Joseph Lucio
2014-01-01
This talk will present an overview of Traffic Flow Management (TFM) research at NASA Ames Research Center. Dr. Rios will focus on his work developing a large-scale, parallel approach to solving traffic flow management problems in the national airspace. In support of this talk, Dr. Rios will provide some background on operational aspects of TFM as well a discussion of some of the tools needed to perform such work including a high-fidelity airspace simulator. Current, on-going research related to TFM data services in the national airspace system and general aviation will also be presented.
Microscale bioprocess optimisation.
Micheletti, Martina; Lye, Gary J
2006-12-01
Microscale processing techniques offer the potential to speed up the delivery of new drugs to the market, reducing development costs and increasing patient benefit. These techniques have application across both the chemical and biopharmaceutical sectors. The approach involves the study of individual bioprocess operations at the microlitre scale using either microwell or microfluidic formats. In both cases the aim is to generate quantitative bioprocess information early on, so as to inform bioprocess design and speed translation to the manufacturing scale. Automation can enhance experimental throughput and will facilitate the parallel evaluation of competing biocatalyst and process options.
The cognitive architecture for chaining of two mental operations.
Sackur, Jérôme; Dehaene, Stanislas
2009-05-01
A simple view, which dates back to Turing, proposes that complex cognitive operations are composed of serially arranged elementary operations, each passing intermediate results to the next. However, whether and how such serial processing is achieved with a brain composed of massively parallel processors, remains an open question. Here, we study the cognitive architecture for chained operations with an elementary arithmetic algorithm: we required participants to add (or subtract) two to a digit, and then compare the result with five. In four experiments, we probed the internal implementation of this task with chronometric analysis, the cued-response method, the priming method, and a subliminal forced-choice procedure. We found evidence for an approximately sequential processing, with an important qualification: the second operation in the algorithm appears to start before completion of the first operation. Furthermore, initially the second operation takes as input the stimulus number rather than the output of the first operation. Thus, operations that should be processed serially are in fact executed partially in parallel. Furthermore, although each elementary operation can proceed subliminally, their chaining does not occur in the absence of conscious perception. Overall, the results suggest that chaining is slow, effortful, imperfect (resulting partly in parallel rather than serial execution) and dependent on conscious control.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sofronov, I.D.; Voronin, B.L.; Butnev, O.I.
1997-12-31
The aim of the work performed is to develop a 3D parallel program for numerical calculation of gas dynamics problem with heat conductivity on distributed memory computational systems (CS), satisfying the condition of numerical result independence from the number of processors involved. Two basically different approaches to the structure of massive parallel computations have been developed. The first approach uses the 3D data matrix decomposition reconstructed at temporal cycle and is a development of parallelization algorithms for multiprocessor CS with shareable memory. The second approach is based on using a 3D data matrix decomposition not reconstructed during a temporal cycle.more » The program was developed on 8-processor CS MP-3 made in VNIIEF and was adapted to a massive parallel CS Meiko-2 in LLNL by joint efforts of VNIIEF and LLNL staffs. A large number of numerical experiments has been carried out with different number of processors up to 256 and the efficiency of parallelization has been evaluated in dependence on processor number and their parameters.« less
NASA Technical Reports Server (NTRS)
Agrawal, Gagan; Sussman, Alan; Saltz, Joel
1993-01-01
Scientific and engineering applications often involve structured meshes. These meshes may be nested (for multigrid codes) and/or irregularly coupled (called multiblock or irregularly coupled regular mesh problems). A combined runtime and compile-time approach for parallelizing these applications on distributed memory parallel machines in an efficient and machine-independent fashion was described. A runtime library which can be used to port these applications on distributed memory machines was designed and implemented. The library is currently implemented on several different systems. To further ease the task of application programmers, methods were developed for integrating this runtime library with compilers for HPK-like parallel programming languages. How this runtime library was integrated with the Fortran 90D compiler being developed at Syracuse University is discussed. Experimental results to demonstrate the efficacy of our approach are presented. A multiblock Navier-Stokes solver template and a multigrid code were experimented with. Our experimental results show that our primitives have low runtime communication overheads. Further, the compiler parallelized codes perform within 20 percent of the code parallelized by manually inserting calls to the runtime library.
Tensor methodology and computational geometry in direct computational experiments in fluid mechanics
NASA Astrophysics Data System (ADS)
Degtyarev, Alexander; Khramushin, Vasily; Shichkina, Julia
2017-07-01
The paper considers a generalized functional and algorithmic construction of direct computational experiments in fluid dynamics. Notation of tensor mathematics is naturally embedded in the finite - element operation in the construction of numerical schemes. Large fluid particle, which have a finite size, its own weight, internal displacement and deformation is considered as an elementary computing object. Tensor representation of computational objects becomes strait linear and uniquely approximation of elementary volumes and fluid particles inside them. The proposed approach allows the use of explicit numerical scheme, which is an important condition for increasing the efficiency of the algorithms developed by numerical procedures with natural parallelism. It is shown that advantages of the proposed approach are achieved among them by considering representation of large particles of a continuous medium motion in dual coordinate systems and computing operations in the projections of these two coordinate systems with direct and inverse transformations. So new method for mathematical representation and synthesis of computational experiment based on large particle method is proposed.
NASA Astrophysics Data System (ADS)
Olson, Richard F.
2013-05-01
Rendering of point scatterer based radar scenes for millimeter wave (mmW) seeker tests in real-time hardware-in-the-loop (HWIL) scene generation requires efficient algorithms and vector-friendly computer architectures for complex signal synthesis. New processor technology from Intel implements an extended 256-bit vector SIMD instruction set (AVX, AVX2) in a multi-core CPU design providing peak execution rates of hundreds of GigaFLOPS (GFLOPS) on one chip. Real world mmW scene generation code can approach peak SIMD execution rates only after careful algorithm and source code design. An effective software design will maintain high computing intensity emphasizing register-to-register SIMD arithmetic operations over data movement between CPU caches or off-chip memories. Engineers at the U.S. Army Aviation and Missile Research, Development and Engineering Center (AMRDEC) applied two basic parallel coding methods to assess new 256-bit SIMD multi-core architectures for mmW scene generation in HWIL. These include use of POSIX threads built on vector library functions and more portable, highlevel parallel code based on compiler technology (e.g. OpenMP pragmas and SIMD autovectorization). Since CPU technology is rapidly advancing toward high processor core counts and TeraFLOPS peak SIMD execution rates, it is imperative that coding methods be identified which produce efficient and maintainable parallel code. This paper describes the algorithms used in point scatterer target model rendering, the parallelization of those algorithms, and the execution performance achieved on an AVX multi-core machine using the two basic parallel coding methods. The paper concludes with estimates for scale-up performance on upcoming multi-core technology.
NASA Technical Reports Server (NTRS)
Athale, R. A.; Lee, S. H.
1978-01-01
The paper describes the fabrication and operation of an optical parallel logic (OPAL) device which performs Boolean algebraic operations on binary images. Several logic operations on two input binary images were demonstrated using an 8 x 8 device with a CdS photoconductor and a twisted nematic liquid crystal. Two such OPAL devices can be interconnected to form a half-adder circuit which is one of the essential components of a CPU in a digital signal processor.
20 kHz main inverter unit. [for space station power supplies
NASA Technical Reports Server (NTRS)
Hussey, S.
1989-01-01
A proof-of-concept main inverter unit has demonstrated the operation of a pulse-width-modulated parallel resonant power stage topology as a 20-kHz ac power source driver, showing simple output regulation, parallel operation, power sharing and short-circuit operation. The use of a two-stage dc input filter controls the electromagnetic compatibility (EMC) characteristics of the dc power bus, and the use of an ac harmonic trap controls the EMC characteristics of the 20-kHz ac power bus.
Programming parallel architectures: The BLAZE family of languages
NASA Technical Reports Server (NTRS)
Mehrotra, Piyush
1988-01-01
Programming multiprocessor architectures is a critical research issue. An overview is given of the various approaches to programming these architectures that are currently being explored. It is argued that two of these approaches, interactive programming environments and functional parallel languages, are particularly attractive since they remove much of the burden of exploiting parallel architectures from the user. Also described is recent work by the author in the design of parallel languages. Research on languages for both shared and nonshared memory multiprocessors is described, as well as the relations of this work to other current language research projects.
Tile-based Level of Detail for the Parallel Age
DOE Office of Scientific and Technical Information (OSTI.GOV)
Niski, K; Cohen, J D
Today's PCs incorporate multiple CPUs and GPUs and are easily arranged in clusters for high-performance, interactive graphics. We present an approach based on hierarchical, screen-space tiles to parallelizing rendering with level of detail. Adapt tiles, render tiles, and machine tiles are associated with CPUs, GPUs, and PCs, respectively, to efficiently parallelize the workload with good resource utilization. Adaptive tile sizes provide load balancing while our level of detail system allows total and independent management of the load on CPUs and GPUs. We demonstrate our approach on parallel configurations consisting of both single PCs and a cluster of PCs.
NASA Astrophysics Data System (ADS)
Vnukov, A. A.; Shershnev, M. B.
2018-01-01
The aim of this work is the software implementation of three image scaling algorithms using parallel computations, as well as the development of an application with a graphical user interface for the Windows operating system to demonstrate the operation of algorithms and to study the relationship between system performance, algorithm execution time and the degree of parallelization of computations. Three methods of interpolation were studied, formalized and adapted to scale images. The result of the work is a program for scaling images by different methods. Comparison of the quality of scaling by different methods is given.
Vascular system modeling in parallel environment - distributed and shared memory approaches
Jurczuk, Krzysztof; Kretowski, Marek; Bezy-Wendling, Johanne
2011-01-01
The paper presents two approaches in parallel modeling of vascular system development in internal organs. In the first approach, new parts of tissue are distributed among processors and each processor is responsible for perfusing its assigned parts of tissue to all vascular trees. Communication between processors is accomplished by passing messages and therefore this algorithm is perfectly suited for distributed memory architectures. The second approach is designed for shared memory machines. It parallelizes the perfusion process during which individual processing units perform calculations concerning different vascular trees. The experimental results, performed on a computing cluster and multi-core machines, show that both algorithms provide a significant speedup. PMID:21550891
NASA Astrophysics Data System (ADS)
Zimovets, Artem; Matviychuk, Alexander; Ushakov, Vladimir
2016-12-01
The paper presents two different approaches to reduce the time of computer calculation of reachability sets. First of these two approaches use different data structures for storing the reachability sets in the computer memory for calculation in single-threaded mode. Second approach is based on using parallel algorithms with reference to the data structures from the first approach. Within the framework of this paper parallel algorithm of approximate reachability set calculation on computer with SMP-architecture is proposed. The results of numerical modelling are presented in the form of tables which demonstrate high efficiency of parallel computing technology and also show how computing time depends on the used data structure.
Parallel multiphase microflows: fundamental physics, stabilization methods and applications.
Aota, Arata; Mawatari, Kazuma; Kitamori, Takehiko
2009-09-07
Parallel multiphase microflows, which can integrate unit operations in a microchip under continuous flow conditions, are discussed. Fundamental physics, stabilization methods and some applications are shown.
Quantum vacuum interaction between two cosmic strings revisited
NASA Astrophysics Data System (ADS)
Muñoz-Castañeda, J. M.; Bordag, M.
2014-03-01
We reconsider the quantum vacuum interaction energy between two straight parallel cosmic strings. This problem was discussed several times in an approach treating both strings perturbatively and treating only one perturbatively. Here we point out that a simplifying assumption made by Bordag [Ann. Phys. (Berlin) 47, 93 (1990).] can be justified and show that, despite the global character of the background, the perturbative approach delivers a correct result. We consider the applicability of the scattering methods, developed in the past decade for the Casimir effect, for the cosmic string and find it not applicable. We calculate the scattering T-operator on one string. Finally, we consider the vacuum interaction of two strings when each carries a two-dimensional delta function potential.
Smart Cameras for Remote Science Survey
NASA Technical Reports Server (NTRS)
Thompson, David R.; Abbey, William; Allwood, Abigail; Bekker, Dmitriy; Bornstein, Benjamin; Cabrol, Nathalie A.; Castano, Rebecca; Estlin, Tara; Fuchs, Thomas; Wagstaff, Kiri L.
2012-01-01
Communication with remote exploration spacecraft is often intermittent and bandwidth is highly constrained. Future missions could use onboard science data understanding to prioritize downlink of critical features [1], draft summary maps of visited terrain [2], or identify targets of opportunity for followup measurements [3]. We describe a generic approach to classify geologic surfaces for autonomous science operations, suitable for parallelized implementations in FPGA hardware. We map these surfaces with texture channels - distinctive numerical signatures that differentiate properties such as roughness, pavement coatings, regolith characteristics, sedimentary fabrics and differential outcrop weathering. This work describes our basic image analysis approach and reports an initial performance evaluation using surface images from the Mars Exploration Rovers. Future work will incorporate these methods into camera hardware for real-time processing.
Ascent control studies of the 049 and ATP parallel burn solid rocket motor shuttle configurations
NASA Technical Reports Server (NTRS)
Ryan, R. S.; Mowery, D. K.; Hammer, M.; Weisler, A. C.
1972-01-01
The control authority approach is discussed as a major problem of the parallel burn soil shuttle configuration due to the many resulting system impacts regardless of the approach. The major trade studies and their results, which led to the recommendation of an SRB TVC control authority approach are presented.
Electronic scraps--recovering of valuable materials from parallel wire cables.
de Araújo, Mishene Christie Pinheiro Bezerra; Chaves, Arthur Pinto; Espinosa, Denise Crocce Romano; Tenório, Jorge Alberto Soares
2008-11-01
Every year, the number of discarded electro-electronic products is increasing. For this reason recycling is needed, to avoid wasting non-renewable natural resources. The objective of this work is to study the recycling of materials from parallel wire cable through unit operations of mineral processing. Parallel wire cables are basically composed of polymer and copper. The following unit operations were tested: grinding, size classification, dense medium separation, electrostatic separation, scrubbing, panning, and elutriation. It was observed that the operations used obtained copper and PVC concentrates with a low degree of cross contamination. It was concluded that total liberation of the materials was accomplished after grinding to less than 3 mm, using a cage mill. Separation using panning and elutriation presented the best results in terms of recovery and cross contamination.
Monolithic Parallel Tandem Organic Photovoltaic Cell with Transparent Carbon Nanotube Interlayer
NASA Technical Reports Server (NTRS)
Tanaka, S.; Mielczarek, K.; Ovalle-Robles, R.; Wang, B.; Hsu, D.; Zakhidov, A. A.
2009-01-01
We demonstrate an organic photovoltaic cell with a monolithic tandem structure in parallel connection. Transparent multiwalled carbon nanotube sheets are used as an interlayer anode electrode for this parallel tandem. The characteristics of front and back cells are measured independently. The short circuit current density of the parallel tandem cell is larger than the currents of each individual cell. The wavelength dependence of photocurrent for the parallel tandem cell shows the superposition spectrum of the two spectral sensitivities of the front and back cells. The monolithic three-electrode photovoltaic cell indeed operates as a parallel tandem with improved efficiency.
Application of a Scalable, Parallel, Unstructured-Grid-Based Navier-Stokes Solver
NASA Technical Reports Server (NTRS)
Parikh, Paresh
2001-01-01
A parallel version of an unstructured-grid based Navier-Stokes solver, USM3Dns, previously developed for efficient operation on a variety of parallel computers, has been enhanced to incorporate upgrades made to the serial version. The resultant parallel code has been extensively tested on a variety of problems of aerospace interest and on two sets of parallel computers to understand and document its characteristics. An innovative grid renumbering construct and use of non-blocking communication are shown to produce superlinear computing performance. Preliminary results from parallelization of a recently introduced "porous surface" boundary condition are also presented.
JELC-LITE: Unconventional Instructional Design for Special Operations Training
NASA Technical Reports Server (NTRS)
Friedman, Mark
2012-01-01
Current special operations staff training is based on the Joint Event Life Cycle (JELC). It addresses operational level tasks in multi-week, live military exercises which are planned over a 12 to 18 month timeframe. As the military experiences changing global mission sets, shorter training events using distributed technologies will increasingly be needed to augment traditional training. JELC-Lite is a new approach for providing relevant training between large scale exercises. This new streamlined, responsive training model uses distributed and virtualized training technologies to establish simulated scenarios. It keeps proficiency levels closer to optimal levels -- thereby reducing the performance degradation inherent in periodic training. It can be delivered to military as well as under-reached interagency groups to facilitate agile, repetitive training events. JELC-Lite is described by four phases paralleling the JELC, differing mostly in scope and scale. It has been successfully used with a Theater Special Operations Command and fits well within the current environment of reduced personnel and financial resources.
A performance comparison of the IBM RS/6000 and the Astronautics ZS-1
DOE Office of Scientific and Technical Information (OSTI.GOV)
Smith, W.M.; Abraham, S.G.; Davidson, E.S.
1991-01-01
Concurrent uniprocessor architectures, of which vector and superscalar are two examples, are designed to capitalize on fine-grain parallelism. The authors have developed a performance evaluation method for comparing and improving these architectures, and in this article they present the methodology and a detailed case study of two machines. The runtime of many programs is dominated by time spent in loop constructs - for example, Fortran Do-loops. Loops generally comprise two logical processes: The access process generates addresses for memory operations while the execute process operates on floating-point data. Memory access patterns typically can be generated independently of the data inmore » the execute process. This independence allows the access process to slip ahead, thereby hiding memory latency. The IBM 360/91 was designed in 1967 to achieve slip dynamically, at runtime. One CPU unit executes integer operations while another handles floating-point operations. Other machines, including the VAX 9000 and the IBM RS/6000, use a similar approach.« less
Three-dimensional laser microvision.
Shimotahira, H; Iizuka, K; Chu, S C; Wah, C; Costen, F; Yoshikuni, Y
2001-04-10
A three-dimensional (3-D) optical imaging system offering high resolution in all three dimensions, requiring minimum manipulation and capable of real-time operation, is presented. The system derives its capabilities from use of the superstructure grating laser source in the implementation of a laser step frequency radar for depth information acquisition. A synthetic aperture radar technique was also used to further enhance its lateral resolution as well as extend the depth of focus. High-speed operation was made possible by a dual computer system consisting of a host and a remote microcomputer supported by a dual-channel Small Computer System Interface parallel data transfer system. The system is capable of operating near real time. The 3-D display of a tunneling diode, a microwave integrated circuit, and a see-through image taken by the system operating near real time are included. The depth resolution is 40 mum; lateral resolution with a synthetic aperture approach is a fraction of a micrometer and that without it is approximately 10 mum.
NASA Astrophysics Data System (ADS)
Buchner, Johannes
2011-12-01
Scheduling, the task of producing a time table for resources and tasks, is well-known to be a difficult problem the more resources are involved (a NP-hard problem). This is about to become an issue in Radio astronomy as observatories consisting of hundreds to thousands of telescopes are planned and operated. The Square Kilometre Array (SKA), which Australia and New Zealand bid to host, is aiming for scales where current approaches -- in construction, operation but also scheduling -- are insufficent. Although manual scheduling is common today, the problem is becoming complicated by the demand for (1) independent sub-arrays doing simultaneous observations, which requires the scheduler to plan parallel observations and (2) dynamic re-scheduling on changed conditions. Both of these requirements apply to the SKA, especially in the construction phase. We review the scheduling approaches taken in the astronomy literature, as well as investigate techniques from human schedulers and today's observatories. The scheduling problem is specified in general for scientific observations and in particular on radio telescope arrays. Also taken into account is the fact that the observatory may be oversubscribed, requiring the scheduling problem to be integrated with a planning process. We solve this long-term scheduling problem using a time-based encoding that works in the very general case of observation scheduling. This research then compares algorithms from various approaches, including fast heuristics from CPU scheduling, Linear Integer Programming and Genetic algorithms, Branch-and-Bound enumeration schemes. Measures include not only goodness of the solution, but also scalability and re-scheduling capabilities. In conclusion, we have identified a fast and good scheduling approach that allows (re-)scheduling difficult and changing problems by combining heuristics with a Genetic algorithm using block-wise mutation operations. We are able to explain and eradicate two problems in the literature: The inability of a GA to properly improve schedules and the generation of schedules with frequent interruptions. Finally, we demonstrate the scheduling framework for several operating telescopes: (1) Dynamic re-scheduling with the AUT Warkworth 12m telescope, (2) Scheduling for the Australian Mopra 22m telescope and scheduling for the Allen Telescope Array. Furthermore, we discuss the applicability of the presented scheduling framework to the Atacama Large Millimeter/submillimeter Array (ALMA, in construction) and the SKA. In particular, during the development phase of the SKA, this dynamic, scalable scheduling framework can accommodate changing conditions.
Design of a massively parallel computer using bit serial processing elements
NASA Technical Reports Server (NTRS)
Aburdene, Maurice F.; Khouri, Kamal S.; Piatt, Jason E.; Zheng, Jianqing
1995-01-01
A 1-bit serial processor designed for a parallel computer architecture is described. This processor is used to develop a massively parallel computational engine, with a single instruction-multiple data (SIMD) architecture. The computer is simulated and tested to verify its operation and to measure its performance for further development.
Comparison between four dissimilar solar panel configurations
NASA Astrophysics Data System (ADS)
Suleiman, K.; Ali, U. A.; Yusuf, Ibrahim; Koko, A. D.; Bala, S. I.
2017-12-01
Several studies on photovoltaic systems focused on how it operates and energy required in operating it. Little attention is paid on its configurations, modeling of mean time to system failure, availability, cost benefit and comparisons of parallel and series-parallel designs. In this research work, four system configurations were studied. Configuration I consists of two sub-components arranged in parallel with 24 V each, configuration II consists of four sub-components arranged logically in parallel with 12 V each, configuration III consists of four sub-components arranged in series-parallel with 8 V each, and configuration IV has six sub-components with 6 V each arranged in series-parallel. Comparative analysis was made using Chapman Kolmogorov's method. The derivation for explicit expression of mean time to system failure, steady state availability and cost benefit analysis were performed, based on the comparison. Ranking method was used to determine the optimal configuration of the systems. The results of analytical and numerical solutions of system availability and mean time to system failure were determined and it was found that configuration I is the optimal configuration.
Design, fabrication and control of origami robots
NASA Astrophysics Data System (ADS)
Rus, Daniela; Tolley, Michael T.
2018-06-01
Origami robots are created using folding processes, which provide a simple approach to fabricating a wide range of robot morphologies. Inspired by biological systems, engineers have started to explore origami folding in combination with smart material actuators to enable intrinsic actuation as a means to decouple design from fabrication complexity. The built-in crease structure of origami bodies has the potential to yield compliance and exhibit many soft body properties. Conventional fabrication of robots is generally a bottom-up assembly process with multiple low-level steps for creating subsystems that include manual operations and often multiple iterations. By contrast, natural systems achieve elegant designs and complex functionalities using top-down parallel transformation approaches such as folding. Folding in nature creates a wide spectrum of complex morpho-functional structures such as proteins and intestines and enables the development of structures such as flowers, leaves and insect wings. Inspired by nature, engineers have started to explore folding powered by embedded smart material actuators to create origami robots. The design and fabrication of origami robots exploits top-down, parallel transformation approaches to achieve elegant designs and complex functionalities. In this Review, we first introduce the concept of origami robotics and then highlight advances in design principles, fabrication methods, actuation, smart materials and control algorithms. Applications of origami robots for a variety of devices are investigated, and future directions of the field are discussed, examining both challenges and opportunities.
Vakalis, Stergios; Caligiuri, Carlo; Moustakas, Konstantinos; Malamis, Dimitris; Renzi, Massimiliano; Baratieri, Marco
2018-03-12
There is a growing market demand for small-scale biomass gasifiers that is driven by the economic incentives and the legislative framework. Small-scale gasifiers produce a gaseous fuel, commonly referred to as producer gas, with relatively low heating value. Thus, the most common energy conversion systems that are coupled with small-scale gasifiers are internal combustion engines. In order to increase the electrical efficiency, the operators choose dual fuel engines and mix the producer gas with diesel. The Wiebe function has been a valuable tool for assessing the efficiency of dual fuel internal combustion engines. This study introduces a thermodynamic model that works in parallel with the Wiebe function and calculates the emissions of the engines. This "vis-à-vis" approach takes into consideration the actual conditions inside the cylinders-as they are returned by the Wiebe function-and calculates the final thermodynamic equilibrium of the flue gases mixture. This approach aims to enhance the operation of the dual fuel internal combustion engines by identifying the optimal operating conditions and-at the same time-advance pollution control and minimize the environmental impact.
A Parallel Particle Swarm Optimization Algorithm Accelerated by Asynchronous Evaluations
NASA Technical Reports Server (NTRS)
Venter, Gerhard; Sobieszczanski-Sobieski, Jaroslaw
2005-01-01
A parallel Particle Swarm Optimization (PSO) algorithm is presented. Particle swarm optimization is a fairly recent addition to the family of non-gradient based, probabilistic search algorithms that is based on a simplified social model and is closely tied to swarming theory. Although PSO algorithms present several attractive properties to the designer, they are plagued by high computational cost as measured by elapsed time. One approach to reduce the elapsed time is to make use of coarse-grained parallelization to evaluate the design points. Previous parallel PSO algorithms were mostly implemented in a synchronous manner, where all design points within a design iteration are evaluated before the next iteration is started. This approach leads to poor parallel speedup in cases where a heterogeneous parallel environment is used and/or where the analysis time depends on the design point being analyzed. This paper introduces an asynchronous parallel PSO algorithm that greatly improves the parallel e ciency. The asynchronous algorithm is benchmarked on a cluster assembled of Apple Macintosh G5 desktop computers, using the multi-disciplinary optimization of a typical transport aircraft wing as an example.
Using CLIPS in the domain of knowledge-based massively parallel programming
NASA Technical Reports Server (NTRS)
Dvorak, Jiri J.
1994-01-01
The Program Development Environment (PDE) is a tool for massively parallel programming of distributed-memory architectures. Adopting a knowledge-based approach, the PDE eliminates the complexity introduced by parallel hardware with distributed memory and offers complete transparency in respect of parallelism exploitation. The knowledge-based part of the PDE is realized in CLIPS. Its principal task is to find an efficient parallel realization of the application specified by the user in a comfortable, abstract, domain-oriented formalism. A large collection of fine-grain parallel algorithmic skeletons, represented as COOL objects in a tree hierarchy, contains the algorithmic knowledge. A hybrid knowledge base with rule modules and procedural parts, encoding expertise about application domain, parallel programming, software engineering, and parallel hardware, enables a high degree of automation in the software development process. In this paper, important aspects of the implementation of the PDE using CLIPS and COOL are shown, including the embedding of CLIPS with C++-based parts of the PDE. The appropriateness of the chosen approach and of the CLIPS language for knowledge-based software engineering are discussed.
Hybrid Parallelism for Volume Rendering on Large-, Multi-, and Many-Core Systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Howison, Mark; Bethel, E. Wes; Childs, Hank
2012-01-01
With the computing industry trending towards multi- and many-core processors, we study how a standard visualization algorithm, ray-casting volume rendering, can benefit from a hybrid parallelism approach. Hybrid parallelism provides the best of both worlds: using distributed-memory parallelism across a large numbers of nodes increases available FLOPs and memory, while exploiting shared-memory parallelism among the cores within each node ensures that each node performs its portion of the larger calculation as efficiently as possible. We demonstrate results from weak and strong scaling studies, at levels of concurrency ranging up to 216,000, and with datasets as large as 12.2 trillion cells.more » The greatest benefit from hybrid parallelism lies in the communication portion of the algorithm, the dominant cost at higher levels of concurrency. We show that reducing the number of participants with a hybrid approach significantly improves performance.« less
Scalable multi-objective control for large scale water resources systems under uncertainty
NASA Astrophysics Data System (ADS)
Giuliani, Matteo; Quinn, Julianne; Herman, Jonathan; Castelletti, Andrea; Reed, Patrick
2016-04-01
The use of mathematical models to support the optimal management of environmental systems is rapidly expanding over the last years due to advances in scientific knowledge of the natural processes, efficiency of the optimization techniques, and availability of computational resources. However, undergoing changes in climate and society introduce additional challenges for controlling these systems, ultimately motivating the emergence of complex models to explore key causal relationships and dependencies on uncontrolled sources of variability. In this work, we contribute a novel implementation of the evolutionary multi-objective direct policy search (EMODPS) method for controlling environmental systems under uncertainty. The proposed approach combines direct policy search (DPS) with hierarchical parallelization of multi-objective evolutionary algorithms (MOEAs) and offers a threefold advantage: the DPS simulation-based optimization can be combined with any simulation model and does not add any constraint on modeled information, allowing the use of exogenous information in conditioning the decisions. Moreover, the combination of DPS and MOEAs prompts the generation or Pareto approximate set of solutions for up to 10 objectives, thus overcoming the decision biases produced by cognitive myopia, where narrow or restrictive definitions of optimality strongly limit the discovery of decision relevant alternatives. Finally, the use of large-scale MOEAs parallelization improves the ability of the designed solutions in handling the uncertainty due to severe natural variability. The proposed approach is demonstrated on a challenging water resources management problem represented by the optimal control of a network of four multipurpose water reservoirs in the Red River basin (Vietnam). As part of the medium-long term energy and food security national strategy, four large reservoirs have been constructed on the Red River tributaries, which are mainly operated for hydropower production, flood control, and water supply. Numerical results under historical as well as synthetically generated hydrologic conditions show that our approach is able to discover key system tradeoffs in the operations of the system. The ability of the algorithm to find near-optimal solutions increases with the number of islands in the adopted hierarchical parallelization scheme. In addition, although significant performance degradation is observed when the solutions designed over history are re-evaluated over synthetically generated inflows, we successfully reduced these vulnerabilities by identifying alternative solutions that are more robust to hydrologic uncertainties, while also addressing the tradeoffs across the Red River multi-sector services.
Parallel runway requirement analysis study. Volume 1: The analysis
NASA Technical Reports Server (NTRS)
Ebrahimi, Yaghoob S.
1993-01-01
The correlation of increased flight delays with the level of aviation activity is well recognized. A main contributor to these flight delays has been the capacity of airports. Though new airport and runway construction would significantly increase airport capacity, few programs of this type are currently underway, let alone planned, because of the high cost associated with such endeavors. Therefore, it is necessary to achieve the most efficient and cost effective use of existing fixed airport resources through better planning and control of traffic flows. In fact, during the past few years the FAA has initiated such an airport capacity program designed to provide additional capacity at existing airports. Some of the improvements that that program has generated thus far have been based on new Air Traffic Control procedures, terminal automation, additional Instrument Landing Systems, improved controller display aids, and improved utilization of multiple runways/Instrument Meteorological Conditions (IMC) approach procedures. A useful element to understanding potential operational capacity enhancements at high demand airports has been the development and use of an analysis tool called The PLAND_BLUNDER (PLB) Simulation Model. The objective for building this simulation was to develop a parametric model that could be used for analysis in determining the minimum safety level of parallel runway operations for various parameters representing the airplane, navigation, surveillance, and ATC system performance. This simulation is useful as: a quick and economical evaluation of existing environments that are experiencing IMC delays, an efficient way to study and validate proposed procedure modifications, an aid in evaluating requirements for new airports or new runways in old airports, a simple, parametric investigation of a wide range of issues and approaches, an ability to tradeoff air and ground technology and procedures contributions, and a way of considering probable blunder mechanisms and range of blunder scenarios. This study describes the steps of building the simulation and considers the input parameters, assumptions and limitations, and available outputs. Validation results and sensitivity analysis are addressed as well as outlining some IMC and Visual Meteorological Conditions (VMC) approaches to parallel runways. Also, present and future applicable technologies (e.g., Digital Autoland Systems, Traffic Collision and Avoidance System II, Enhanced Situational Awareness System, Global Positioning Systems for Landing, etc.) are assessed and recommendations made.
NASA Technical Reports Server (NTRS)
Campbell, R. H.; Essick, Ray B.; Johnston, Gary; Kenny, Kevin; Russo, Vince
1987-01-01
Project EOS is studying the problems of building adaptable real-time embedded operating systems for the scientific missions of NASA. Choices (A Class Hierarchical Open Interface for Custom Embedded Systems) is an operating system designed and built by Project EOS to address the following specific issues: the software architecture for adaptable embedded parallel operating systems, the achievement of high-performance and real-time operation, the simplification of interprocess communications, the isolation of operating system mechanisms from one another, and the separation of mechanisms from policy decisions. Choices is written in C++ and runs on a ten processor Encore Multimax. The system is intended for use in constructing specialized computer applications and research on advanced operating system features including fault tolerance and parallelism.
Exploring the Feasibility of a DNA Computer: Design of an ALU Using Sticker-Based DNA Model.
Sarkar, Mayukh; Ghosal, Prasun; Mohanty, Saraju P
2017-09-01
Since its inception, DNA computing has advanced to offer an extremely powerful, energy-efficient emerging technology for solving hard computational problems with its inherent massive parallelism and extremely high data density. This would be much more powerful and general purpose when combined with other existing well-known algorithmic solutions that exist for conventional computing architectures using a suitable ALU. Thus, a specifically designed DNA Arithmetic and Logic Unit (ALU) that can address operations suitable for both domains can mitigate the gap between these two. An ALU must be able to perform all possible logic operations, including NOT, OR, AND, XOR, NOR, NAND, and XNOR; compare, shift etc., integer and floating point arithmetic operations (addition, subtraction, multiplication, and division). In this paper, design of an ALU has been proposed using sticker-based DNA model with experimental feasibility analysis. Novelties of this paper may be in manifold. First, the integer arithmetic operations performed here are 2s complement arithmetic, and the floating point operations follow the IEEE 754 floating point format, resembling closely to a conventional ALU. Also, the output of each operation can be reused for any next operation. So any algorithm or program logic that users can think of can be implemented directly on the DNA computer without any modification. Second, once the basic operations of sticker model can be automated, the implementations proposed in this paper become highly suitable to design a fully automated ALU. Third, proposed approaches are easy to implement. Finally, these approaches can work on sufficiently large binary numbers.
Increasing airport capacity with modified IFR approach procedures for close-spaced parallel runways
DOT National Transportation Integrated Search
2001-01-01
Because of wake turbulence considerations, current instrument approach : procedures treat close-spaced (i.e., less than 2,500 feet apart) parallel run : ways as a single runway. This restriction is designed to assure safety for all : aircraft types u...
NASA Astrophysics Data System (ADS)
Chaves-González, José M.; Vega-Rodríguez, Miguel A.; Gómez-Pulido, Juan A.; Sánchez-Pérez, Juan M.
2011-08-01
This article analyses the use of a novel parallel evolutionary strategy to solve complex optimization problems. The work developed here has been focused on a relevant real-world problem from the telecommunication domain to verify the effectiveness of the approach. The problem, known as frequency assignment problem (FAP), basically consists of assigning a very small number of frequencies to a very large set of transceivers used in a cellular phone network. Real data FAP instances are very difficult to solve due to the NP-hard nature of the problem, therefore using an efficient parallel approach which makes the most of different evolutionary strategies can be considered as a good way to obtain high-quality solutions in short periods of time. Specifically, a parallel hyper-heuristic based on several meta-heuristics has been developed. After a complete experimental evaluation, results prove that the proposed approach obtains very high-quality solutions for the FAP and beats any other result published.
NASA Technical Reports Server (NTRS)
Waller, Marvin C. (Editor); Scanlon, Charles H. (Editor)
1996-01-01
A Government and Industry workshop on Flight-Deck-Centered Parallel Runway Approaches in Instrument Meteorological Conditions (IMC) was conducted October 29, 1996 at the NASA Langley Research Center. This document contains the slides and records of the proceedings of the workshop. The purpose of the workshop was to disclose to the National airspace community the status of ongoing NASA R&D to address the closely spaced parallel runway problem in IMC and to seek advice and input on direction of future work to assure an optimized research approach. The workshop also included a description of a Paired Approach Concept which is being studied at United Airlines for application at the San Francisco International Airport.
Data Parallel Bin-Based Indexing for Answering Queries on Multi-Core Architectures
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gosink, Luke; Wu, Kesheng; Bethel, E. Wes
2009-06-02
The multi-core trend in CPUs and general purpose graphics processing units (GPUs) offers new opportunities for the database community. The increase of cores at exponential rates is likely to affect virtually every server and client in the coming decade, and presents database management systems with a huge, compelling disruption that will radically change how processing is done. This paper presents a new parallel indexing data structure for answering queries that takes full advantage of the increasing thread-level parallelism emerging in multi-core architectures. In our approach, our Data Parallel Bin-based Index Strategy (DP-BIS) first bins the base data, and then partitionsmore » and stores the values in each bin as a separate, bin-based data cluster. In answering a query, the procedures for examining the bin numbers and the bin-based data clusters offer the maximum possible level of concurrency; each record is evaluated by a single thread and all threads are processed simultaneously in parallel. We implement and demonstrate the effectiveness of DP-BIS on two multi-core architectures: a multi-core CPU and a GPU. The concurrency afforded by DP-BIS allows us to fully utilize the thread-level parallelism provided by each architecture--for example, our GPU-based DP-BIS implementation simultaneously evaluates over 12,000 records with an equivalent number of concurrently executing threads. In comparing DP-BIS's performance across these architectures, we show that the GPU-based DP-BIS implementation requires significantly less computation time to answer a query than the CPU-based implementation. We also demonstrate in our analysis that DP-BIS provides better overall performance than the commonly utilized CPU and GPU-based projection index. Finally, due to data encoding, we show that DP-BIS accesses significantly smaller amounts of data than index strategies that operate solely on a column's base data; this smaller data footprint is critical for parallel processors that possess limited memory resources (e.g., GPUs).« less
How do strategic decisions and operative practices affect operating room productivity?
Peltokorpi, Antti
2011-12-01
Surgical operating rooms are cost-intensive parts of health service production. Managing operating units efficiently is essential when hospitals and healthcare systems aim to maximize health outcomes with limited resources. Previous research about operating room management has focused on studying the effect of management practices and decisions on efficiency by utilizing mainly modeling approach or before-after analysis in single hospital case. The purpose of this research is to analyze the synergic effect of strategic decisions and operative management practices on operating room productivity and to use a multiple case study method enabling statistical hypothesis testing with empirical data. 11 hypotheses that propose connections between the use of strategic and operative practices and productivity were tested in a multi-hospital study that included 26 units. The results indicate that operative practices, such as personnel management, case scheduling and performance measurement, affect productivity more remarkably than do strategic decisions that relate to, e.g., units' size, scope or academic status. Units with different strategic positions should apply different operative practices: Focused hospital units benefit most from sophisticated case scheduling and parallel processing whereas central and ambulatory units should apply flexible working hours, incentives and multi-skilled personnel. Operating units should be more active in applying management practices which are adequate for their strategic orientation.
A real-space stochastic density matrix approach for density functional electronic structure.
Beck, Thomas L
2015-12-21
The recent development of real-space grid methods has led to more efficient, accurate, and adaptable approaches for large-scale electrostatics and density functional electronic structure modeling. With the incorporation of multiscale techniques, linear-scaling real-space solvers are possible for density functional problems if localized orbitals are used to represent the Kohn-Sham energy functional. These methods still suffer from high computational and storage overheads, however, due to extensive matrix operations related to the underlying wave function grid representation. In this paper, an alternative stochastic method is outlined that aims to solve directly for the one-electron density matrix in real space. In order to illustrate aspects of the method, model calculations are performed for simple one-dimensional problems that display some features of the more general problem, such as spatial nodes in the density matrix. This orbital-free approach may prove helpful considering a future involving increasingly parallel computing architectures. Its primary advantage is the near-locality of the random walks, allowing for simultaneous updates of the density matrix in different regions of space partitioned across the processors. In addition, it allows for testing and enforcement of the particle number and idempotency constraints through stabilization of a Feynman-Kac functional integral as opposed to the extensive matrix operations in traditional approaches.
NASA Technical Reports Server (NTRS)
1974-01-01
The proposed spacecraft consists of a bus module, containing all subsystems required for support of the sensors, and a payload module containing all of the sensor equipment. The two modules are bolted together to form the spacecraft, and electrical interfaces are accomplished via mated connectors at the interface plane. This approach permits independent parallel assembly and test operations on each module up until mating for final spacecraft integration and test operations. Proposed program schedules recognize the need to refine sensor/spacecraft interfaces prior to proceeding with procurement, reflect the lead times estimated by suppliers for delivery of equipment, reflect a comprehensive test program, and provide flexibility for unanticipated problems. The spacecraft systems are described in detail along with aerospace ground equipment, ground handling equipment, the launch vehicle, imaging radar incorporation, and systems tests.
Tracking moving radar targets with parallel, velocity-tuned filters
Bickel, Douglas L.; Harmony, David W.; Bielek, Timothy P.; Hollowell, Jeff A.; Murray, Margaret S.; Martinez, Ana
2013-04-30
Radar data associated with radar illumination of a movable target is processed to monitor motion of the target. A plurality of filter operations are performed in parallel on the radar data so that each filter operation produces target image information. The filter operations are defined to have respectively corresponding velocity ranges that differ from one another. The target image information produced by one of the filter operations represents the target more accurately than the target image information produced by the remainder of the filter operations when a current velocity of the target is within the velocity range associated with the one filter operation. In response to the current velocity of the target being within the velocity range associated with the one filter operation, motion of the target is tracked based on the target image information produced by the one filter operation.
An Efficient Fuzzy Controller Design for Parallel Connected Induction Motor Drives
NASA Astrophysics Data System (ADS)
Usha, S.; Subramani, C.
2018-04-01
Generally, an induction motors are highly non-linear and has a complex time varying dynamics. This makes the speed control of an induction motor a challenging issue in the industries. But, due to the recent trends in the power electronic devices and intelligent controllers, the speed control of the induction motor is achieved by including non-linear characteristics also. Conventionally a single inverter is used to run one induction motor in industries. In the traction applications, two or more inductions motors are operated in parallel to reduce the size and cost of induction motors. In this application, the parallel connected induction motors can be driven by a single inverter unit. The stability problems may introduce in the parallel operation under low speed operating conditions. Hence, the speed deviations should be reduce with help of suitable controllers. The speed control of the parallel connected system is performed by PID controller and fuzzy logic controller. In this paper the speed response of the induction motor for the rating of IHP, 1440 rpm, and 50Hz with these controller are compared in time domain specifications. The stability analysis of the system also performed under low speed using matlab platform. The hardware model is developed for speed control using fuzzy logic controller which exhibited superior performances over the other controller.
Acceleration of Radiance for Lighting Simulation by Using Parallel Computing with OpenCL
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zuo, Wangda; McNeil, Andrew; Wetter, Michael
2011-09-06
We report on the acceleration of annual daylighting simulations for fenestration systems in the Radiance ray-tracing program. The algorithm was optimized to reduce both the redundant data input/output operations and the floating-point operations. To further accelerate the simulation speed, the calculation for matrix multiplications was implemented using parallel computing on a graphics processing unit. We used OpenCL, which is a cross-platform parallel programming language. Numerical experiments show that the combination of the above measures can speed up the annual daylighting simulations 101.7 times or 28.6 times when the sky vector has 146 or 2306 elements, respectively.
Zhao, Jing; Zong, Haili
2018-01-01
In this paper, we propose parallel and cyclic iterative algorithms for solving the multiple-set split equality common fixed-point problem of firmly quasi-nonexpansive operators. We also combine the process of cyclic and parallel iterative methods and propose two mixed iterative algorithms. Our several algorithms do not need any prior information about the operator norms. Under mild assumptions, we prove weak convergence of the proposed iterative sequences in Hilbert spaces. As applications, we obtain several iterative algorithms to solve the multiple-set split equality problem.
Short-Term Load Forecasting Based Automatic Distribution Network Reconfiguration: Preprint
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jiang, Huaiguang; Ding, Fei; Zhang, Yingchen
In the traditional dynamic network reconfiguration study, the optimal topology is determined at every scheduled time point by using the real load data measured at that time. The development of load forecasting technique can provide accurate prediction of load power that will happen in future time and provide more information about load changes. With the inclusion of load forecasting, the optimal topology can be determined based on the predicted load conditions during the longer time period instead of using the snapshot of load at the time when the reconfiguration happens, and thus it can provide information to the distribution systemmore » operator (DSO) to better operate the system reconfiguration to achieve optimal solutions. Thus, this paper proposes a short-term load forecasting based approach for automatically reconfiguring distribution systems in a dynamic and pre-event manner. Specifically, a short-term and high-resolution distribution system load forecasting approach is proposed with support vector regression (SVR) based forecaster and parallel parameters optimization. And the network reconfiguration problem is solved by using the forecasted load continuously to determine the optimal network topology with the minimum loss at the future time. The simulation results validate and evaluate the proposed approach.« less
Proximity operations analysis: Retrieval of the solar maximum mission observatory
NASA Technical Reports Server (NTRS)
Yglesias, J. A.
1980-01-01
Retrieval of the solar maximum mission (SMM) observatory is feasible in terms of orbiter primary reaction control system (PRCS) plume disturbance of the SMM, orbiter propellant consumed, and flight time required. Man-in-loop simulations will be required to validate these operational techniques before the verification process is complete. Candidate approach and flyaround techniques were developed that allow the orbiter to attain the proper alinement with the SMM for clear access to the grapple fixture (GF) prior grappling. Because the SMM has very little control authority (approximately 14.8 pound-foot-seconds in two axes and rate-damped in the third) it is necessary to inhibit all +Z (upfiring) PRCS jets on the orbiter to avoid tumbling the SMM. A profile involving a V-bar approach and an out-of-plane flyaround appears to be the best choice and is recommended at this time. The flyaround technique consists of alining the +X-axes of the two vehicles parallel with each other and then flying the orbiter around the SMM until the GF is in view. The out-of-plane flyaround technique is applicable to any inertially stabilized payload, and, the entire final approach profile could be considered as standard for most retrieval missions.
Multibus-based parallel processor for simulation
NASA Technical Reports Server (NTRS)
Ogrady, E. P.; Wang, C.-H.
1983-01-01
A Multibus-based parallel processor simulation system is described. The system is intended to serve as a vehicle for gaining hands-on experience, testing system and application software, and evaluating parallel processor performance during development of a larger system based on the horizontal/vertical-bus interprocessor communication mechanism. The prototype system consists of up to seven Intel iSBC 86/12A single-board computers which serve as processing elements, a multiple transmission controller (MTC) designed to support system operation, and an Intel Model 225 Microcomputer Development System which serves as the user interface and input/output processor. All components are interconnected by a Multibus/IEEE 796 bus. An important characteristic of the system is that it provides a mechanism for a processing element to broadcast data to other selected processing elements. This parallel transfer capability is provided through the design of the MTC and a minor modification to the iSBC 86/12A board. The operation of the MTC, the basic hardware-level operation of the system, and pertinent details about the iSBC 86/12A and the Multibus are described.
Evaluation of fault-tolerant parallel-processor architectures over long space missions
NASA Technical Reports Server (NTRS)
Johnson, Sally C.
1989-01-01
The impact of a five year space mission environment on fault-tolerant parallel processor architectures is examined. The target application is a Strategic Defense Initiative (SDI) satellite requiring 256 parallel processors to provide the computation throughput. The reliability requirements are that the system still be operational after five years with .99 probability and that the probability of system failure during one-half hour of full operation be less than 10(-7). The fault tolerance features an architecture must possess to meet these reliability requirements are presented, many potential architectures are briefly evaluated, and one candidate architecture, the Charles Stark Draper Laboratory's Fault-Tolerant Parallel Processor (FTPP) is evaluated in detail. A methodology for designing a preliminary system configuration to meet the reliability and performance requirements of the mission is then presented and demonstrated by designing an FTPP configuration.
Fast parallel molecular algorithms for DNA-based computation: factoring integers.
Chang, Weng-Long; Guo, Minyi; Ho, Michael Shan-Hui
2005-06-01
The RSA public-key cryptosystem is an algorithm that converts input data to an unrecognizable encryption and converts the unrecognizable data back into its original decryption form. The security of the RSA public-key cryptosystem is based on the difficulty of factoring the product of two large prime numbers. This paper demonstrates to factor the product of two large prime numbers, and is a breakthrough in basic biological operations using a molecular computer. In order to achieve this, we propose three DNA-based algorithms for parallel subtractor, parallel comparator, and parallel modular arithmetic that formally verify our designed molecular solutions for factoring the product of two large prime numbers. Furthermore, this work indicates that the cryptosystems using public-key are perhaps insecure and also presents clear evidence of the ability of molecular computing to perform complicated mathematical operations.
Development and Applications of a Modular Parallel Process for Large Scale Fluid/Structures Problems
NASA Technical Reports Server (NTRS)
Guruswamy, Guru P.; Kwak, Dochan (Technical Monitor)
2002-01-01
A modular process that can efficiently solve large scale multidisciplinary problems using massively parallel supercomputers is presented. The process integrates disciplines with diverse physical characteristics by retaining the efficiency of individual disciplines. Computational domain independence of individual disciplines is maintained using a meta programming approach. The process integrates disciplines without affecting the combined performance. Results are demonstrated for large scale aerospace problems on several supercomputers. The super scalability and portability of the approach is demonstrated on several parallel computers.
Development and Applications of a Modular Parallel Process for Large Scale Fluid/Structures Problems
NASA Technical Reports Server (NTRS)
Guruswamy, Guru P.; Byun, Chansup; Kwak, Dochan (Technical Monitor)
2001-01-01
A modular process that can efficiently solve large scale multidisciplinary problems using massively parallel super computers is presented. The process integrates disciplines with diverse physical characteristics by retaining the efficiency of individual disciplines. Computational domain independence of individual disciplines is maintained using a meta programming approach. The process integrates disciplines without affecting the combined performance. Results are demonstrated for large scale aerospace problems on several supercomputers. The super scalability and portability of the approach is demonstrated on several parallel computers.
Cost related sensitivity analysis for optimal operation of a grid-parallel PEM fuel cell power plant
NASA Astrophysics Data System (ADS)
El-Sharkh, M. Y.; Tanrioven, M.; Rahman, A.; Alam, M. S.
Fuel cell power plants (FCPP) as a combined source of heat, power and hydrogen (CHP&H) can be considered as a potential option to supply both thermal and electrical loads. Hydrogen produced from the FCPP can be stored for future use of the FCPP or can be sold for profit. In such a system, tariff rates for purchasing or selling electricity, the fuel cost for the FCPP/thermal load, and hydrogen selling price are the main factors that affect the operational strategy. This paper presents a hybrid evolutionary programming and Hill-Climbing based approach to evaluate the impact of change of the above mentioned cost parameters on the optimal operational strategy of the FCPP. The optimal operational strategy of the FCPP for different tariffs is achieved through the estimation of the following: hourly generated power, the amount of thermal power recovered, power trade with the local grid, and the quantity of hydrogen that can be produced. Results show the importance of optimizing system cost parameters in order to minimize overall operating cost.
A review of aircraft turnaround operations and simulations
NASA Astrophysics Data System (ADS)
Schmidt, Michael
2017-07-01
The ground operational processes are the connecting element between aircraft en-route operations and airport infrastructure. An efficient aircraft turnaround is an essential component of airline success, especially for regional and short-haul operations. It is imperative that advancements in ground operations, specifically process reliability and passenger comfort, are developed while dealing with increasing passenger traffic in the next years. This paper provides an introduction to aircraft ground operations focusing on the aircraft turnaround and passenger processes. Furthermore, key challenges for current aircraft operators, such as airport capacity constraints, schedule disruptions and the increasing cost pressure, are highlighted. A review of the conducted studies and conceptual work in this field shows pathways for potential process improvements. Promising approaches attempt to reduce apron traffic and parallelize passenger processes and taxiing. The application of boarding strategies and novel cabin layouts focusing on aisle, door and seat, are options to shorten the boarding process inside the cabin. A summary of existing modeling and simulation frameworks give an insight into state-of-the-art assessment capabilities as it concerns advanced concepts. They are the prerequisite to allow a holistic assessment during the early stages of the preliminary aircraft design process and to identify benefits and drawbacks for all involved stakeholders.
Debugging Fortran on a shared memory machine
DOE Office of Scientific and Technical Information (OSTI.GOV)
Allen, T.R.; Padua, D.A.
1987-01-01
Debugging on a parallel processor is more difficult than debugging on a serial machine because errors in a parallel program may introduce nondeterminism. The approach to parallel debugging presented here attempts to reduce the problem of debugging on a parallel machine to that of debugging on a serial machine by automatically detecting nondeterminism. 20 refs., 6 figs.
Dispatching packets on a global combining network of a parallel computer
Almasi, Gheorghe [Ardsley, NY; Archer, Charles J [Rochester, MN
2011-07-19
Methods, apparatus, and products are disclosed for dispatching packets on a global combining network of a parallel computer comprising a plurality of nodes connected for data communications using the network capable of performing collective operations and point to point operations that include: receiving, by an origin system messaging module on an origin node from an origin application messaging module on the origin node, a storage identifier and an operation identifier, the storage identifier specifying storage containing an application message for transmission to a target node, and the operation identifier specifying a message passing operation; packetizing, by the origin system messaging module, the application message into network packets for transmission to the target node, each network packet specifying the operation identifier and an operation type for the message passing operation specified by the operation identifier; and transmitting, by the origin system messaging module, the network packets to the target node.
Covering Resilience: A Recent Development for Binomial Checkpointing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Walther, Andrea; Narayanan, Sri Hari Krishna
In terms of computing time, adjoint methods offer a very attractive alternative to compute gradient information, required, e.g., for optimization purposes. However, together with this very favorable temporal complexity result comes a memory requirement that is in essence proportional with the operation count of the underlying function, e.g., if algorithmic differentiation is used to provide the adjoints. For this reason, checkpointing approaches in many variants have become popular. This paper analyzes an extension of the so-called binomial approach to cover also possible failures of the computing systems. Such a measure of precaution is of special interest for massive parallel simulationsmore » and adjoint calculations where the mean time between failure of the large scale computing system is smaller than the time needed to complete the calculation of the adjoint information. We describe the extensions of standard checkpointing approaches required for such resilience, provide a corresponding implementation and discuss first numerical results.« less
Program For Parallel Discrete-Event Simulation
NASA Technical Reports Server (NTRS)
Beckman, Brian C.; Blume, Leo R.; Geiselman, John S.; Presley, Matthew T.; Wedel, John J., Jr.; Bellenot, Steven F.; Diloreto, Michael; Hontalas, Philip J.; Reiher, Peter L.; Weiland, Frederick P.
1991-01-01
User does not have to add any special logic to aid in synchronization. Time Warp Operating System (TWOS) computer program is special-purpose operating system designed to support parallel discrete-event simulation. Complete implementation of Time Warp mechanism. Supports only simulations and other computations designed for virtual time. Time Warp Simulator (TWSIM) subdirectory contains sequential simulation engine interface-compatible with TWOS. TWOS and TWSIM written in, and support simulations in, C programming language.
NASA Astrophysics Data System (ADS)
Akil, Mohamed
2017-05-01
The real-time processing is getting more and more important in many image processing applications. Image segmentation is one of the most fundamental tasks image analysis. As a consequence, many different approaches for image segmentation have been proposed. The watershed transform is a well-known image segmentation tool. The watershed transform is a very data intensive task. To achieve acceleration and obtain real-time processing of watershed algorithms, parallel architectures and programming models for multicore computing have been developed. This paper focuses on the survey of the approaches for parallel implementation of sequential watershed algorithms on multicore general purpose CPUs: homogeneous multicore processor with shared memory. To achieve an efficient parallel implementation, it's necessary to explore different strategies (parallelization/distribution/distributed scheduling) combined with different acceleration and optimization techniques to enhance parallelism. In this paper, we give a comparison of various parallelization of sequential watershed algorithms on shared memory multicore architecture. We analyze the performance measurements of each parallel implementation and the impact of the different sources of overhead on the performance of the parallel implementations. In this comparison study, we also discuss the advantages and disadvantages of the parallel programming models. Thus, we compare the OpenMP (an application programming interface for multi-Processing) with Ptheads (POSIX Threads) to illustrate the impact of each parallel programming model on the performance of the parallel implementations.
Application of an impedance matching transformer to a plasma focus.
Bures, B L; James, C; Krishnan, M; Adler, R
2011-10-01
A plasma focus was constructed using an impedance matching transformer to improve power transfer between the pulse power and the dynamic plasma load. The system relied on two switches and twelve transformer cores to produce a 100 kA pulse in short circuit on the secondary at 27 kV on the primary with 110 J stored. With the two transformer systems in parallel, the Thevenin equivalent circuit parameters on the secondary side of the driver are: C = 10.9 μF, V(0) = 4.5 kV, L = 17 nH, and R = 5 mΩ. An equivalent direct drive circuit would require a large number of switches in parallel, to achieve the same Thevenin equivalent. The benefits of this approach are replacement of consumable switches with non-consumable transformer cores, reduction of the driver inductance and resistance as viewed by the dynamic load, and reduction of the stored energy to produce a given peak current. The system is designed to operate at 100 Hz, so minimizing the stored energy results in less load on the thermal management system. When operated at 1 Hz, the neutron yield from the transformer matched plasma focus was similar to the neutron yield from a conventional (directly driven) plasma focus at the same peak current.
Recent advances in characterisation of subsonic axisymmetric nozzles
NASA Astrophysics Data System (ADS)
Tesař, Václav
2018-06-01
Nozzles are devices generating jets. They are widely used in fluidics and also in active control of flows past bodies. Being practically always a component of larger system, design and optimisation of the system needs characterisation of nozzle properties by an invariant quantity. Perhaps surprisingly, no suitable invariant has been so far introduced. This article surveys approaches to characterisation quantities and presents several examples of their typical use in systems such as parallel operation of two nozzles, matching a nozzle to its fluid supply source, apparent resistance increase in flows with pulsation, and the secondary invariants of a family of quasi-similar nozzles.
Applications considerations in the system design of highly concurrent multiprocessors
NASA Technical Reports Server (NTRS)
Lundstrom, Stephen F.
1987-01-01
A flow model processor approach to parallel processing is described, using very-high-performance individual processors, high-speed circuit switched interconnection networks, and a high-speed synchronization capability to minimize the effect of the inherently serial portions of applications on performance. Design studies related to the determination of the number of processors, the memory organization, and the structure of the networks used to interconnect the processor and memory resources are discussed. Simulations indicate that applications centered on the large shared data memory should be able to sustain over 500 million floating point operations per second.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Estevez, Ivan; Concept Scientific Instruments, ZA de Courtaboeuf, 2 rue de la Terre de Feu, 91940 Les Ulis; Chrétien, Pascal
2014-02-24
On the basis of a home-made nanoscale impedance measurement device associated with a commercial atomic force microscope, a specific operating process is proposed in order to improve absolute (in sense of “nonrelative”) capacitance imaging by drastically reducing the parasitic effects due to stray capacitance, surface topography, and sample tilt. The method, combining a two-pass image acquisition with the exploitation of approach curves, has been validated on sets of calibration samples consisting in square parallel plate capacitors for which theoretical capacitance values were numerically calculated.
Massively parallel multicanonical simulations
NASA Astrophysics Data System (ADS)
Gross, Jonathan; Zierenberg, Johannes; Weigel, Martin; Janke, Wolfhard
2018-03-01
Generalized-ensemble Monte Carlo simulations such as the multicanonical method and similar techniques are among the most efficient approaches for simulations of systems undergoing discontinuous phase transitions or with rugged free-energy landscapes. As Markov chain methods, they are inherently serial computationally. It was demonstrated recently, however, that a combination of independent simulations that communicate weight updates at variable intervals allows for the efficient utilization of parallel computational resources for multicanonical simulations. Implementing this approach for the many-thread architecture provided by current generations of graphics processing units (GPUs), we show how it can be efficiently employed with of the order of 104 parallel walkers and beyond, thus constituting a versatile tool for Monte Carlo simulations in the era of massively parallel computing. We provide the fully documented source code for the approach applied to the paradigmatic example of the two-dimensional Ising model as starting point and reference for practitioners in the field.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Reed, D.A.; Grunwald, D.C.
The spectrum of parallel processor designs can be divided into three sections according to the number and complexity of the processors. At one end there are simple, bit-serial processors. Any one of thee processors is of little value, but when it is coupled with many others, the aggregate computing power can be large. This approach to parallel processing can be likened to a colony of termites devouring a log. The most notable examples of this approach are the NASA/Goodyear Massively Parallel Processor, which has 16K one-bit processors, and the Thinking Machines Connection Machine, which has 64K one-bit processors. At themore » other end of the spectrum, a small number of processors, each built using the fastest available technology and the most sophisticated architecture, are combined. An example of this approach is the Cray X-MP. This type of parallel processing is akin to four woodmen attacking the log with chainsaws.« less
Single-agent parallel window search
NASA Technical Reports Server (NTRS)
Powley, Curt; Korf, Richard E.
1991-01-01
Parallel window search is applied to single-agent problems by having different processes simultaneously perform iterations of Iterative-Deepening-A(asterisk) (IDA-asterisk) on the same problem but with different cost thresholds. This approach is limited by the time to perform the goal iteration. To overcome this disadvantage, the authors consider node ordering. They discuss how global node ordering by minimum h among nodes with equal f = g + h values can reduce the time complexity of serial IDA-asterisk by reducing the time to perform the iterations prior to the goal iteration. Finally, the two ideas of parallel window search and node ordering are combined to eliminate the weaknesses of each approach while retaining the strengths. The resulting approach, called simply parallel window search, can be used to find a near-optimal solution quickly, improve the solution until it is optimal, and then finally guarantee optimality, depending on the amount of time available.
3D hyperpolarized C-13 EPI with calibrationless parallel imaging
NASA Astrophysics Data System (ADS)
Gordon, Jeremy W.; Hansen, Rie B.; Shin, Peter J.; Feng, Yesu; Vigneron, Daniel B.; Larson, Peder E. Z.
2018-04-01
With the translation of metabolic MRI with hyperpolarized 13C agents into the clinic, imaging approaches will require large volumetric FOVs to support clinical applications. Parallel imaging techniques will be crucial to increasing volumetric scan coverage while minimizing RF requirements and temporal resolution. Calibrationless parallel imaging approaches are well-suited for this application because they eliminate the need to acquire coil profile maps or auto-calibration data. In this work, we explored the utility of a calibrationless parallel imaging method (SAKE) and corresponding sampling strategies to accelerate and undersample hyperpolarized 13C data using 3D blipped EPI acquisitions and multichannel receive coils, and demonstrated its application in a human study of [1-13C]pyruvate metabolism.
A comparative study of serial and parallel aeroelastic computations of wings
NASA Technical Reports Server (NTRS)
Byun, Chansup; Guruswamy, Guru P.
1994-01-01
A procedure for computing the aeroelasticity of wings on parallel multiple-instruction, multiple-data (MIMD) computers is presented. In this procedure, fluids are modeled using Euler equations, and structures are modeled using modal or finite element equations. The procedure is designed in such a way that each discipline can be developed and maintained independently by using a domain decomposition approach. In the present parallel procedure, each computational domain is scalable. A parallel integration scheme is used to compute aeroelastic responses by solving fluid and structural equations concurrently. The computational efficiency issues of parallel integration of both fluid and structural equations are investigated in detail. This approach, which reduces the total computational time by a factor of almost 2, is demonstrated for a typical aeroelastic wing by using various numbers of processors on the Intel iPSC/860.
Li, Haiou; Lu, Liyao; Chen, Rong; Quan, Lijun; Xia, Xiaoyan; Lü, Qiang
2014-01-01
Structural information related to protein-peptide complexes can be very useful for novel drug discovery and design. The computational docking of protein and peptide can supplement the structural information available on protein-peptide interactions explored by experimental ways. Protein-peptide docking of this paper can be described as three processes that occur in parallel: ab-initio peptide folding, peptide docking with its receptor, and refinement of some flexible areas of the receptor as the peptide is approaching. Several existing methods have been used to sample the degrees of freedom in the three processes, which are usually triggered in an organized sequential scheme. In this paper, we proposed a parallel approach that combines all the three processes during the docking of a folding peptide with a flexible receptor. This approach mimics the actual protein-peptide docking process in parallel way, and is expected to deliver better performance than sequential approaches. We used 22 unbound protein-peptide docking examples to evaluate our method. Our analysis of the results showed that the explicit refinement of the flexible areas of the receptor facilitated more accurate modeling of the interfaces of the complexes, while combining all of the moves in parallel helped the constructing of energy funnels for predictions.
Determining collective barrier operation skew in a parallel computer
DOE Office of Scientific and Technical Information (OSTI.GOV)
Faraj, Daniel A.
2015-11-24
Determining collective barrier operation skew in a parallel computer that includes a number of compute nodes organized into an operational group includes: for each of the nodes until each node has been selected as a delayed node: selecting one of the nodes as a delayed node; entering, by each node other than the delayed node, a collective barrier operation; entering, after a delay by the delayed node, the collective barrier operation; receiving an exit signal from a root of the collective barrier operation; and measuring, for the delayed node, a barrier completion time. The barrier operation skew is calculated by:more » identifying, from the compute nodes' barrier completion times, a maximum barrier completion time and a minimum barrier completion time and calculating the barrier operation skew as the difference of the maximum and the minimum barrier completion time.« less
Labeled trees and the efficient computation of derivations
NASA Technical Reports Server (NTRS)
Grossman, Robert; Larson, Richard G.
1989-01-01
The effective parallel symbolic computation of operators under composition is discussed. Examples include differential operators under composition and vector fields under the Lie bracket. Data structures consisting of formal linear combinations of rooted labeled trees are discussed. A multiplication on rooted labeled trees is defined, thereby making the set of these data structures into an associative algebra. An algebra homomorphism is defined from the original algebra of operators into this algebra of trees. An algebra homomorphism from the algebra of trees into the algebra of differential operators is then described. The cancellation which occurs when noncommuting operators are expressed in terms of commuting ones occurs naturally when the operators are represented using this data structure. This leads to an algorithm which, for operators which are derivations, speeds up the computation exponentially in the degree of the operator. It is shown that the algebra of trees leads naturally to a parallel version of the algorithm.
Determining collective barrier operation skew in a parallel computer
DOE Office of Scientific and Technical Information (OSTI.GOV)
Faraj, Daniel A.
Determining collective barrier operation skew in a parallel computer that includes a number of compute nodes organized into an operational group includes: for each of the nodes until each node has been selected as a delayed node: selecting one of the nodes as a delayed node; entering, by each node other than the delayed node, a collective barrier operation; entering, after a delay by the delayed node, the collective barrier operation; receiving an exit signal from a root of the collective barrier operation; and measuring, for the delayed node, a barrier completion time. The barrier operation skew is calculated by:more » identifying, from the compute nodes' barrier completion times, a maximum barrier completion time and a minimum barrier completion time and calculating the barrier operation skew as the difference of the maximum and the minimum barrier completion time.« less
Optimizing transformations of stencil operations for parallel cache-based architectures
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bassetti, F.; Davis, K.
This paper describes a new technique for optimizing serial and parallel stencil- and stencil-like operations for cache-based architectures. This technique takes advantage of the semantic knowledge implicity in stencil-like computations. The technique is implemented as a source-to-source program transformation; because of its specificity it could not be expected of a conventional compiler. Empirical results demonstrate a uniform factor of two speedup. The experiments clearly show the benefits of this technique to be a consequence, as intended, of the reduction in cache misses. The test codes are based on a 5-point stencil obtained by the discretization of the Poisson equation andmore » applied to a two-dimensional uniform grid using the Jacobi method as an iterative solver. Results are presented for a 1-D tiling for a single processor, and in parallel using 1-D data partition. For the parallel case both blocking and non-blocking communication are tested. The same scheme of experiments has bee n performed for the 2-D tiling case. However, for the parallel case the 2-D partitioning is not discussed here, so the parallel case handled for 2-D is 2-D tiling with 1-D data partitioning.« less
A Pervasive Parallel Processing Framework for Data Visualization and Analysis at Extreme Scale
DOE Office of Scientific and Technical Information (OSTI.GOV)
Moreland, Kenneth; Geveci, Berk
2014-11-01
The evolution of the computing world from teraflop to petaflop has been relatively effortless, with several of the existing programming models scaling effectively to the petascale. The migration to exascale, however, poses considerable challenges. All industry trends infer that the exascale machine will be built using processors containing hundreds to thousands of cores per chip. It can be inferred that efficient concurrency on exascale machines requires a massive amount of concurrent threads, each performing many operations on a localized piece of data. Currently, visualization libraries and applications are based off what is known as the visualization pipeline. In the pipelinemore » model, algorithms are encapsulated as filters with inputs and outputs. These filters are connected by setting the output of one component to the input of another. Parallelism in the visualization pipeline is achieved by replicating the pipeline for each processing thread. This works well for today’s distributed memory parallel computers but cannot be sustained when operating on processors with thousands of cores. Our project investigates a new visualization framework designed to exhibit the pervasive parallelism necessary for extreme scale machines. Our framework achieves this by defining algorithms in terms of worklets, which are localized stateless operations. Worklets are atomic operations that execute when invoked unlike filters, which execute when a pipeline request occurs. The worklet design allows execution on a massive amount of lightweight threads with minimal overhead. Only with such fine-grained parallelism can we hope to fill the billions of threads we expect will be necessary for efficient computation on an exascale machine.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hirata, So
2003-11-20
We develop a symbolic manipulation program and program generator (Tensor Contraction Engine or TCE) that automatically derives the working equations of a well-defined model of second-quantized many-electron theories and synthesizes efficient parallel computer programs on the basis of these equations. Provided an ansatz of a many-electron theory model, TCE performs valid contractions of creation and annihilation operators according to Wick's theorem, consolidates identical terms, and reduces the expressions into the form of multiple tensor contractions acted by permutation operators. Subsequently, it determines the binary contraction order for each multiple tensor contraction with the minimal operation and memory cost, factorizes commonmore » binary contractions (defines intermediate tensors), and identifies reusable intermediates. The resulting ordered list of binary tensor contractions, additions, and index permutations is translated into an optimized program that is combined with the NWChem and UTChem computational chemistry software packages. The programs synthesized by TCE take advantage of spin symmetry, Abelian point-group symmetry, and index permutation symmetry at every stage of calculations to minimize the number of arithmetic operations and storage requirement, adjust the peak local memory usage by index range tiling, and support parallel I/O interfaces and dynamic load balancing for parallel executions. We demonstrate the utility of TCE through automatic derivation and implementation of parallel programs for various models of configuration-interaction theory (CISD, CISDT, CISDTQ), many-body perturbation theory [MBPT(2), MBPT(3), MBPT(4)], and coupled-cluster theory (LCCD, CCD, LCCSD, CCSD, QCISD, CCSDT, and CCSDTQ).« less
INVITED TOPICAL REVIEW: Parallel magnetic resonance imaging
NASA Astrophysics Data System (ADS)
Larkman, David J.; Nunes, Rita G.
2007-04-01
Parallel imaging has been the single biggest innovation in magnetic resonance imaging in the last decade. The use of multiple receiver coils to augment the time consuming Fourier encoding has reduced acquisition times significantly. This increase in speed comes at a time when other approaches to acquisition time reduction were reaching engineering and human limits. A brief summary of spatial encoding in MRI is followed by an introduction to the problem parallel imaging is designed to solve. There are a large number of parallel reconstruction algorithms; this article reviews a cross-section, SENSE, SMASH, g-SMASH and GRAPPA, selected to demonstrate the different approaches. Theoretical (the g-factor) and practical (coil design) limits to acquisition speed are reviewed. The practical implementation of parallel imaging is also discussed, in particular coil calibration. How to recognize potential failure modes and their associated artefacts are shown. Well-established applications including angiography, cardiac imaging and applications using echo planar imaging are reviewed and we discuss what makes a good application for parallel imaging. Finally, active research areas where parallel imaging is being used to improve data quality by repairing artefacted images are also reviewed.
Parallel adaptive wavelet collocation method for PDEs
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nejadmalayeri, Alireza, E-mail: Alireza.Nejadmalayeri@gmail.com; Vezolainen, Alexei, E-mail: Alexei.Vezolainen@Colorado.edu; Brown-Dymkoski, Eric, E-mail: Eric.Browndymkoski@Colorado.edu
2015-10-01
A parallel adaptive wavelet collocation method for solving a large class of Partial Differential Equations is presented. The parallelization is achieved by developing an asynchronous parallel wavelet transform, which allows one to perform parallel wavelet transform and derivative calculations with only one data synchronization at the highest level of resolution. The data are stored using tree-like structure with tree roots starting at a priori defined level of resolution. Both static and dynamic domain partitioning approaches are developed. For the dynamic domain partitioning, trees are considered to be the minimum quanta of data to be migrated between the processes. This allowsmore » fully automated and efficient handling of non-simply connected partitioning of a computational domain. Dynamic load balancing is achieved via domain repartitioning during the grid adaptation step and reassigning trees to the appropriate processes to ensure approximately the same number of grid points on each process. The parallel efficiency of the approach is discussed based on parallel adaptive wavelet-based Coherent Vortex Simulations of homogeneous turbulence with linear forcing at effective non-adaptive resolutions up to 2048{sup 3} using as many as 2048 CPU cores.« less
NASA Technical Reports Server (NTRS)
Torres-Pomales, Wilfredo; Madden, Michael M.; Butler, Rickey W.; Perry, Raleigh B.
2014-01-01
This report presents analytical and simulation results of an investigation into proposed operational concepts for closely spaced parallel runways, including the Simplified Aircraft-based Paired Approach (SAPA) with alerting and an escape maneuver, MITRE?s echelon spacing and no escape maneuver, and a hybrid concept aimed at lowering the visibility minima. We found that the SAPA procedure can be used at 950 ft separations or higher with next-generation avionics and that 1150 ft separations or higher is feasible with current-rule compliant ADS-B OUT. An additional 50 ft reduction in runway separation for the SAPA procedure is possible if different glideslopes are used. For the echelon concept we determined that current generation aircraft cannot conduct paired approaches on parallel paths using echelon spacing on runways less than 1400 ft apart and next-generation aircraft will not be able to conduct paired approach on runways less than 1050 ft apart. The hybrid concept added alerting and an escape maneuver starting 1 NM from the threshold when flying the echelon concept. This combination was found to be effective, but the probability of a collision can be seriously impacted if the turn component of the escape maneuver has to be disengaged near the ground (e.g. 300 ft or below) due to airport buildings and surrounding terrain. We also found that stabilizing the approach path in the straight-in segment was only possible if the merge point was at least 1.5 to 2 NM from the threshold unless the total system error can be sufficiently constrained on the offset path and final turn.
GPU accelerated dynamic functional connectivity analysis for functional MRI data.
Akgün, Devrim; Sakoğlu, Ünal; Esquivel, Johnny; Adinoff, Bryon; Mete, Mutlu
2015-07-01
Recent advances in multi-core processors and graphics card based computational technologies have paved the way for an improved and dynamic utilization of parallel computing techniques. Numerous applications have been implemented for the acceleration of computationally-intensive problems in various computational science fields including bioinformatics, in which big data problems are prevalent. In neuroimaging, dynamic functional connectivity (DFC) analysis is a computationally demanding method used to investigate dynamic functional interactions among different brain regions or networks identified with functional magnetic resonance imaging (fMRI) data. In this study, we implemented and analyzed a parallel DFC algorithm based on thread-based and block-based approaches. The thread-based approach was designed to parallelize DFC computations and was implemented in both Open Multi-Processing (OpenMP) and Compute Unified Device Architecture (CUDA) programming platforms. Another approach developed in this study to better utilize CUDA architecture is the block-based approach, where parallelization involves smaller parts of fMRI time-courses obtained by sliding-windows. Experimental results showed that the proposed parallel design solutions enabled by the GPUs significantly reduce the computation time for DFC analysis. Multicore implementation using OpenMP on 8-core processor provides up to 7.7× speed-up. GPU implementation using CUDA yielded substantial accelerations ranging from 18.5× to 157× speed-up once thread-based and block-based approaches were combined in the analysis. Proposed parallel programming solutions showed that multi-core processor and CUDA-supported GPU implementations accelerated the DFC analyses significantly. Developed algorithms make the DFC analyses more practical for multi-subject studies with more dynamic analyses. Copyright © 2015 Elsevier Ltd. All rights reserved.
Parallel algorithms for boundary value problems
NASA Technical Reports Server (NTRS)
Lin, Avi
1990-01-01
A general approach to solve boundary value problems numerically in a parallel environment is discussed. The basic algorithm consists of two steps: the local step where all the P available processors work in parallel, and the global step where one processor solves a tridiagonal linear system of the order P. The main advantages of this approach are two fold. First, this suggested approach is very flexible, especially in the local step and thus the algorithm can be used with any number of processors and with any of the SIMD or MIMD machines. Secondly, the communication complexity is very small and thus can be used as easily with shared memory machines. Several examples for using this strategy are discussed.
A template-based approach for parallel hexahedral two-refinement
Owen, Steven J.; Shih, Ryan M.; Ernst, Corey D.
2016-10-17
Here, we provide a template-based approach for generating locally refined all-hex meshes. We focus specifically on refinement of initially structured grids utilizing a 2-refinement approach where uniformly refined hexes are subdivided into eight child elements. The refinement algorithm consists of identifying marked nodes that are used as the basis for a set of four simple refinement templates. The target application for 2-refinement is a parallel grid-based all-hex meshing tool for high performance computing in a distributed environment. The result is a parallel consistent locally refined mesh requiring minimal communication and where minimum mesh quality is greater than scaled Jacobian 0.3more » prior to smoothing.« less
A template-based approach for parallel hexahedral two-refinement
DOE Office of Scientific and Technical Information (OSTI.GOV)
Owen, Steven J.; Shih, Ryan M.; Ernst, Corey D.
Here, we provide a template-based approach for generating locally refined all-hex meshes. We focus specifically on refinement of initially structured grids utilizing a 2-refinement approach where uniformly refined hexes are subdivided into eight child elements. The refinement algorithm consists of identifying marked nodes that are used as the basis for a set of four simple refinement templates. The target application for 2-refinement is a parallel grid-based all-hex meshing tool for high performance computing in a distributed environment. The result is a parallel consistent locally refined mesh requiring minimal communication and where minimum mesh quality is greater than scaled Jacobian 0.3more » prior to smoothing.« less
Dynamic calibration of pan-tilt-zoom cameras for traffic monitoring.
Song, Kai-Tai; Tai, Jen-Chao
2006-10-01
Pan-tilt-zoom (PTZ) cameras have been widely used in recent years for monitoring and surveillance applications. These cameras provide flexible view selection as well as a wider observation range. This makes them suitable for vision-based traffic monitoring and enforcement systems. To employ PTZ cameras for image measurement applications, one first needs to calibrate the camera to obtain meaningful results. For instance, the accuracy of estimating vehicle speed depends on the accuracy of camera calibration and that of vehicle tracking results. This paper presents a novel calibration method for a PTZ camera overlooking a traffic scene. The proposed approach requires no manual operation to select the positions of special features. It automatically uses a set of parallel lane markings and the lane width to compute the camera parameters, namely, focal length, tilt angle, and pan angle. Image processing procedures have been developed for automatically finding parallel lane markings. Interesting experimental results are presented to validate the robustness and accuracy of the proposed method.
Numerical analysis of propeller induced ground vortices by actuator disk model.
Yang, Y; Veldhuis, L L M; Eitelberg, G
2018-01-01
During the ground operation of aircraft, the interaction between the propulsor-induced flow field and the ground may lead to the generation of ground vortices. Utilizing numerical approaches, the source of vorticity entering ground vortices is investigated. The results show that the production of wall-parallel components of vorticity has a strong contribution from the wall-parallel components of the pressure gradient on the wall, which is generated by the action of the propulsor. This mechanism is a supplementation for the vorticity transported from the far-field boundary layer, which has been assumed the main vorticity source in a number of previous publications. Furthermore, the quantitative prediction of the occurrence of ground vortices is performed from the numerical results. As the distance of the propeller form the ground decreases, and as the thrust of the propeller increases, ground vortices are generated from the ground and enter the propeller. In addition, the vortices which exist near the ground but does not enter the propeller plane are observed and visualized by three-dimensional data.
Photonics for aerospace sensors
NASA Astrophysics Data System (ADS)
Pellegrino, John; Adler, Eric D.; Filipov, Andree N.; Harrison, Lorna J.; van der Gracht, Joseph; Smith, Dale J.; Tayag, Tristan J.; Viveiros, Edward A.
1992-11-01
The maturation in the state-of-the-art of optical components is enabling increased applications for the technology. Most notable is the ever-expanding market for fiber optic data and communications links, familiar in both commercial and military markets. The inherent properties of optics and photonics, however, have suggested that components and processors may be designed that offer advantages over more commonly considered digital approaches for a variety of airborne sensor and signal processing applications. Various academic, industrial, and governmental research groups have been actively investigating and exploiting these properties of high bandwidth, large degree of parallelism in computation (e.g., processing in parallel over a two-dimensional field), and interconnectivity, and have succeeded in advancing the technology to the stage of systems demonstration. Such advantages as computational throughput and low operating power consumption are highly attractive for many computationally intensive problems. This review covers the key devices necessary for optical signal and image processors, some of the system application demonstration programs currently in progress, and active research directions for the implementation of next-generation architectures.
Track finding in ATLAS using GPUs
NASA Astrophysics Data System (ADS)
Mattmann, J.; Schmitt, C.
2012-12-01
The reconstruction and simulation of collision events is a major task in modern HEP experiments involving several ten thousands of standard CPUs. On the other hand the graphics processors (GPUs) have become much more powerful and are by far outperforming the standard CPUs in terms of floating point operations due to their massive parallel approach. The usage of these GPUs could therefore significantly reduce the overall reconstruction time per event or allow for the usage of more sophisticated algorithms. In this paper the track finding in the ATLAS experiment will be used as an example on how the GPUs can be used in this context: the implementation on the GPU requires a change in the algorithmic flow to allow the code to work in the rather limited environment on the GPU in terms of memory, cache, and transfer speed from and to the GPU and to make use of the massive parallel computation. Both, the specific implementation of parts of the ATLAS track reconstruction chain and the performance improvements obtained will be discussed.
Linux Kernel Co-Scheduling and Bulk Synchronous Parallelism
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jones, Terry R
2012-01-01
This paper describes a kernel scheduling algorithm that is based on coscheduling principles and that is intended for parallel applications running on 1000 cores or more. Experimental results for a Linux implementation on a Cray XT5 machine are presented. The results indicate that Linux is a suitable operating system for this new scheduling scheme, and that this design provides a dramatic improvement in scaling performance for synchronizing collective operations at scale.
Partitioning Rectangular and Structurally Nonsymmetric Sparse Matrices for Parallel Processing
DOE Office of Scientific and Technical Information (OSTI.GOV)
B. Hendrickson; T.G. Kolda
1998-09-01
A common operation in scientific computing is the multiplication of a sparse, rectangular or structurally nonsymmetric matrix and a vector. In many applications the matrix- transpose-vector product is also required. This paper addresses the efficient parallelization of these operations. We show that the problem can be expressed in terms of partitioning bipartite graphs. We then introduce several algorithms for this partitioning problem and compare their performance on a set of test matrices.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hepburn, I.; De Schutter, E., E-mail: erik@oist.jp; Theoretical Neurobiology & Neuroengineering, University of Antwerp, Antwerp 2610
Spatial stochastic molecular simulations in biology are limited by the intense computation required to track molecules in space either in a discrete time or discrete space framework, which has led to the development of parallel methods that can take advantage of the power of modern supercomputers in recent years. We systematically test suggested components of stochastic reaction-diffusion operator splitting in the literature and discuss their effects on accuracy. We introduce an operator splitting implementation for irregular meshes that enhances accuracy with minimal performance cost. We test a range of models in small-scale MPI simulations from simple diffusion models to realisticmore » biological models and find that multi-dimensional geometry partitioning is an important consideration for optimum performance. We demonstrate performance gains of 1-3 orders of magnitude in the parallel implementation, with peak performance strongly dependent on model specification.« less
Mirbozorgi, S Abdollah; Bahrami, Hadi; Sawan, Mohamad; Gosselin, Benoit
2016-04-01
This paper presents a novel experimental chamber with uniform wireless power distribution in 3D for enabling long-term biomedical experiments with small freely moving animal subjects. The implemented power transmission chamber prototype is based on arrays of parallel resonators and multicoil inductive links, to form a novel and highly efficient wireless power transmission system. The power transmitter unit includes several identical resonators enclosed in a scalable array of overlapping square coils which are connected in parallel to provide uniform power distribution along x and y. Moreover, the proposed chamber uses two arrays of primary resonators, facing each other, and connected in parallel to achieve uniform power distribution along the z axis. Each surface includes 9 overlapped coils connected in parallel and implemented into two layers of FR4 printed circuit board. The chamber features a natural power localization mechanism, which simplifies its implementation and ease its operation by avoiding the need for active detection and control mechanisms. A single power surface based on the proposed approach can provide a power transfer efficiency (PTE) of 69% and a power delivered to the load (PDL) of 120 mW, for a separation distance of 4 cm, whereas the complete chamber prototype provides a uniform PTE of 59% and a PDL of 100 mW in 3D, everywhere inside the chamber with a size of 27×27×16 cm(3).
YAPPA: a Compiler-Based Parallelization Framework for Irregular Applications on MPSoCs
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lovergine, Silvia; Tumeo, Antonino; Villa, Oreste
Modern embedded systems include hundreds of cores. Because of the difficulty in providing a fast, coherent memory architecture, these systems usually rely on non-coherent, non-uniform memory architectures with private memories for each core. However, programming these systems poses significant challenges. The developer must extract large amounts of parallelism, while orchestrating communication among cores to optimize application performance. These issues become even more significant with irregular applications, which present data sets difficult to partition, unpredictable memory accesses, unbalanced control flow and fine grained communication. Hand-optimizing every single aspect is hard and time-consuming, and it often does not lead to the expectedmore » performance. There is a growing gap between such complex and highly-parallel architectures and the high level languages used to describe the specification, which were designed for simpler systems and do not consider these new issues. In this paper we introduce YAPPA (Yet Another Parallel Programming Approach), a compilation framework for the automatic parallelization of irregular applications on modern MPSoCs based on LLVM. We start by considering an efficient parallel programming approach for irregular applications on distributed memory systems. We then propose a set of transformations that can reduce the development and optimization effort. The results of our initial prototype confirm the correctness of the proposed approach.« less
Framework for Parallel Preprocessing of Microarray Data Using Hadoop
2018-01-01
Nowadays, microarray technology has become one of the popular ways to study gene expression and diagnosis of disease. National Center for Biology Information (NCBI) hosts public databases containing large volumes of biological data required to be preprocessed, since they carry high levels of noise and bias. Robust Multiarray Average (RMA) is one of the standard and popular methods that is utilized to preprocess the data and remove the noises. Most of the preprocessing algorithms are time-consuming and not able to handle a large number of datasets with thousands of experiments. Parallel processing can be used to address the above-mentioned issues. Hadoop is a well-known and ideal distributed file system framework that provides a parallel environment to run the experiment. In this research, for the first time, the capability of Hadoop and statistical power of R have been leveraged to parallelize the available preprocessing algorithm called RMA to efficiently process microarray data. The experiment has been run on cluster containing 5 nodes, while each node has 16 cores and 16 GB memory. It compares efficiency and the performance of parallelized RMA using Hadoop with parallelized RMA using affyPara package as well as sequential RMA. The result shows the speed-up rate of the proposed approach outperforms the sequential approach and affyPara approach. PMID:29796018
Collectively loading programs in a multiple program multiple data environment
DOE Office of Scientific and Technical Information (OSTI.GOV)
Aho, Michael E.; Attinella, John E.; Gooding, Thomas M.
Techniques are disclosed for loading programs efficiently in a parallel computing system. In one embodiment, nodes of the parallel computing system receive a load description file which indicates, for each program of a multiple program multiple data (MPMD) job, nodes which are to load the program. The nodes determine, using collective operations, a total number of programs to load and a number of programs to load in parallel. The nodes further generate a class route for each program to be loaded in parallel, where the class route generated for a particular program includes only those nodes on which the programmore » needs to be loaded. For each class route, a node is selected using a collective operation to be a load leader which accesses a file system to load the program associated with a class route and broadcasts the program via the class route to other nodes which require the program.« less
NASA Technical Reports Server (NTRS)
Goldstein, David
1991-01-01
Extensions to an architecture for real-time, distributed (parallel) knowledge-based systems called the Parallel Real-time Artificial Intelligence System (PRAIS) are discussed. PRAIS strives for transparently parallelizing production (rule-based) systems, even under real-time constraints. PRAIS accomplished these goals (presented at the first annual C Language Integrated Production System (CLIPS) conference) by incorporating a dynamic task scheduler, operating system extensions for fact handling, and message-passing among multiple copies of CLIPS executing on a virtual blackboard. This distributed knowledge-based system tool uses the portability of CLIPS and common message-passing protocols to operate over a heterogeneous network of processors. Results using the original PRAIS architecture over a network of Sun 3's, Sun 4's and VAX's are presented. Mechanisms using the producer-consumer model to extend the architecture for fault-tolerance and distributed truth maintenance initiation are also discussed.
A parallel orbital-updating based plane-wave basis method for electronic structure calculations
NASA Astrophysics Data System (ADS)
Pan, Yan; Dai, Xiaoying; de Gironcoli, Stefano; Gong, Xin-Gao; Rignanese, Gian-Marco; Zhou, Aihui
2017-11-01
Motivated by the recently proposed parallel orbital-updating approach in real space method [1], we propose a parallel orbital-updating based plane-wave basis method for electronic structure calculations, for solving the corresponding eigenvalue problems. In addition, we propose two new modified parallel orbital-updating methods. Compared to the traditional plane-wave methods, our methods allow for two-level parallelization, which is particularly interesting for large scale parallelization. Numerical experiments show that these new methods are more reliable and efficient for large scale calculations on modern supercomputers.
Hwang, Na-Hyun; Lee, Yoon-Hwan; You, Hi-Jin; Yoon, Eul-Sik; Kim, Deok-Woo
2016-07-01
In recent years, endoscope-assisted transoral approach for condylar fracture treatment has attracted much attention. However, the surgical approach is technically challenging: the procedure requires specialized instruments and the surgeons experience a steep learning curve. During the transoral endoscopic (TE) approach several instruments are positioned through a narrow oral incision making endoscope maneuvering very difficult. For this reason, the authors changed the entry port of the endoscope from transoral to submandibular area through a small stab incision. The aim of this study is to assess the advantage of using the submandibular endoscopic intraoral approach (SEI).The SEI approach requires intraoral incision for fracture reduction and fixation, and 4 mm size submandibular stab incision for endoscope and traction wires. Fifteen patients with condyle neck and subcondyle fractures were operated under the submandibular approach and 15 patients with the same diagnosis were operated under the standard TE approach.The SEI approach allowed clear visualization of the posterior margin of the ramus and condyle, and the visual axis was parallel to the condyle ramus unit. The TE approach clearly shows the anterior margin of the condyle and the sigmoid notch. The surgical time of the SEI group was 128 minutes and the TE group was 120 minutes (P >0.05). All patients in the TE endoscope group were fixated with the trocar system, but only 2 lower neck fracture patients in the SEI group required a trocar. The other 13 subcondyle fractures were fixated with an angulated screw driver (P <0.05). There were no differences in complication and surgical outcomes.The submandibular endoscopic approach has an advantage of having more space with good visualization, and facilitated the use of an angulated screw driver.
A Parallel Vector Machine for the PM Programming Language
NASA Astrophysics Data System (ADS)
Bellerby, Tim
2016-04-01
PM is a new programming language which aims to make the writing of computational geoscience models on parallel hardware accessible to scientists who are not themselves expert parallel programmers. It is based around the concept of communicating operators: language constructs that enable variables local to a single invocation of a parallelised loop to be viewed as if they were arrays spanning the entire loop domain. This mechanism enables different loop invocations (which may or may not be executing on different processors) to exchange information in a manner that extends the successful Communicating Sequential Processes idiom from single messages to collective communication. Communicating operators avoid the additional synchronisation mechanisms, such as atomic variables, required when programming using the Partitioned Global Address Space (PGAS) paradigm. Using a single loop invocation as the fundamental unit of concurrency enables PM to uniformly represent different levels of parallelism from vector operations through shared memory systems to distributed grids. This paper describes an implementation of PM based on a vectorised virtual machine. On a single processor node, concurrent operations are implemented using masked vector operations. Virtual machine instructions operate on vectors of values and may be unmasked, masked using a Boolean field, or masked using an array of active vector cell locations. Conditional structures (such as if-then-else or while statement implementations) calculate and apply masks to the operations they control. A shift in mask representation from Boolean to location-list occurs when active locations become sufficiently sparse. Parallel loops unfold data structures (or vectors of data structures for nested loops) into vectors of values that may additionally be distributed over multiple computational nodes and then split into micro-threads compatible with the size of the local cache. Inter-node communication is accomplished using standard OpenMP and MPI. Performance analyses of the PM vector machine, demonstrating its scaling properties with respect to domain size and the number of processor nodes will be presented for a range of hardware configurations. The PM software and language definition are being made available under unrestrictive MIT and Creative Commons Attribution licenses respectively: www.pm-lang.org.
Speculation and replication in temperature accelerated dynamics
Zamora, Richard J.; Perez, Danny; Voter, Arthur F.
2018-02-12
Accelerated Molecular Dynamics (AMD) is a class of MD-based algorithms for the long-time scale simulation of atomistic systems that are characterized by rare-event transitions. Temperature-Accelerated Dynamics (TAD), a traditional AMD approach, hastens state-to-state transitions by performing MD at an elevated temperature. Recently, Speculatively-Parallel TAD (SpecTAD) was introduced, allowing the TAD procedure to exploit parallel computing systems by concurrently executing in a dynamically generated list of speculative future states. Although speculation can be very powerful, it is not always the most efficient use of parallel resources. In this paper, we compare the performance of speculative parallelism with a replica-based technique, similarmore » to the Parallel Replica Dynamics method. A hybrid SpecTAD approach is also presented, in which each speculation process is further accelerated by a local set of replicas. Finally and overall, this work motivates the use of hybrid parallelism whenever possible, as some combination of speculation and replication is typically most efficient.« less
Speculation and replication in temperature accelerated dynamics
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zamora, Richard J.; Perez, Danny; Voter, Arthur F.
Accelerated Molecular Dynamics (AMD) is a class of MD-based algorithms for the long-time scale simulation of atomistic systems that are characterized by rare-event transitions. Temperature-Accelerated Dynamics (TAD), a traditional AMD approach, hastens state-to-state transitions by performing MD at an elevated temperature. Recently, Speculatively-Parallel TAD (SpecTAD) was introduced, allowing the TAD procedure to exploit parallel computing systems by concurrently executing in a dynamically generated list of speculative future states. Although speculation can be very powerful, it is not always the most efficient use of parallel resources. In this paper, we compare the performance of speculative parallelism with a replica-based technique, similarmore » to the Parallel Replica Dynamics method. A hybrid SpecTAD approach is also presented, in which each speculation process is further accelerated by a local set of replicas. Finally and overall, this work motivates the use of hybrid parallelism whenever possible, as some combination of speculation and replication is typically most efficient.« less
A Robust and Scalable Software Library for Parallel Adaptive Refinement on Unstructured Meshes
NASA Technical Reports Server (NTRS)
Lou, John Z.; Norton, Charles D.; Cwik, Thomas A.
1999-01-01
The design and implementation of Pyramid, a software library for performing parallel adaptive mesh refinement (PAMR) on unstructured meshes, is described. This software library can be easily used in a variety of unstructured parallel computational applications, including parallel finite element, parallel finite volume, and parallel visualization applications using triangular or tetrahedral meshes. The library contains a suite of well-designed and efficiently implemented modules that perform operations in a typical PAMR process. Among these are mesh quality control during successive parallel adaptive refinement (typically guided by a local-error estimator), parallel load-balancing, and parallel mesh partitioning using the ParMeTiS partitioner. The Pyramid library is implemented in Fortran 90 with an interface to the Message-Passing Interface (MPI) library, supporting code efficiency, modularity, and portability. An EM waveguide filter application, adaptively refined using the Pyramid library, is illustrated.
Sanders, David M.; Decker, Derek E.
1999-01-01
Optical patterns and lithographic techniques are used as part of a process to embed parallel and evenly spaced conductors in the non-planar surfaces of an insulator to produce high gradient insulators. The approach extends the size that high gradient insulating structures can be fabricated as well as improves the performance of those insulators by reducing the scale of the alternating parallel lines of insulator and conductor along the surface. This fabrication approach also substantially decreases the cost required to produce high gradient insulators.
NASA Astrophysics Data System (ADS)
Liska, Sebastian; Colonius, Tim
2017-02-01
A new parallel, computationally efficient immersed boundary method for solving three-dimensional, viscous, incompressible flows on unbounded domains is presented. Immersed surfaces with prescribed motions are generated using the interpolation and regularization operators obtained from the discrete delta function approach of the original (Peskin's) immersed boundary method. Unlike Peskin's method, boundary forces are regarded as Lagrange multipliers that are used to satisfy the no-slip condition. The incompressible Navier-Stokes equations are discretized on an unbounded staggered Cartesian grid and are solved in a finite number of operations using lattice Green's function techniques. These techniques are used to automatically enforce the natural free-space boundary conditions and to implement a novel block-wise adaptive grid that significantly reduces the run-time cost of solutions by limiting operations to grid cells in the immediate vicinity and near-wake region of the immersed surface. These techniques also enable the construction of practical discrete viscous integrating factors that are used in combination with specialized half-explicit Runge-Kutta schemes to accurately and efficiently solve the differential algebraic equations describing the discrete momentum equation, incompressibility constraint, and no-slip constraint. Linear systems of equations resulting from the time integration scheme are efficiently solved using an approximation-free nested projection technique. The algebraic properties of the discrete operators are used to reduce projection steps to simple discrete elliptic problems, e.g. discrete Poisson problems, that are compatible with recent parallel fast multipole methods for difference equations. Numerical experiments on low-aspect-ratio flat plates and spheres at Reynolds numbers up to 3700 are used to verify the accuracy and physical fidelity of the formulation.
Ouellet, Jean A.; Richards, Corey; Sardar, Zeeshan M.; Giannitsios, Demetri; Noiseux, Nicholas; Strydom, Willem S.; Reindl, Rudy; Jarzem, Peter; Arlet, Vincent; Steffen, Thomas
2013-01-01
The ideal treatment for unstable thoracolumbar fractures remains controversial with posterior reduction and stabilization, anterior reduction and stabilization, combined posterior and anterior reduction and stabilization, and even nonoperative management advocated. Short segment posterior osteosynthesis of these fractures has less comorbidities compared with the other operative approaches but settles into kyphosis over time. Biomechanical comparison of the divergent bridge construct versus the parallel tension band construct was performed for anteriorly destabilized T11–L1 spine segments using three different models: (1) finite element analysis (FEA), (2) a synthetic model, and (3) a human cadaveric model. Outcomes measured were construct stiffness and ultimate failure load. Our objective was to determine if the divergent pedicle screw bridge construct would provide more resistance to kyphotic deforming forces. All three modalities showed greater stiffness with the divergent bridge construct. The FEA calculated a stiffness of 21.6 N/m for the tension band construct versus 34.1 N/m for the divergent bridge construct. The synthetic model resulted in a mean stiffness of 17.3 N/m for parallel tension band versus 20.6 N/m for the divergent bridge (p = 0.03), whereas the cadaveric model had an average stiffness of 15.2 N/m in the parallel tension band compared with 18.4 N/m for the divergent bridge (p = 0.02). Ultimate failure load with the cadaveric model was found to be 622 N for the divergent bridge construct versus 419 N (p = 0.15) for the parallel tension band construct. This study confirms our clinical experience that the short posterior divergent bridge construct provides greater stiffness for the management of unstable thoracolumbar fractures. PMID:24436856
NASA Astrophysics Data System (ADS)
Baregheh, Mandana; Mezentsev, Vladimir; Schmitz, Holger
2011-06-01
We describe a parallel multi-threaded approach for high performance modelling of wide class of phenomena in ultrafast nonlinear optics. Specific implementation has been performed using the highly parallel capabilities of a programmable graphics processor.
NASA Astrophysics Data System (ADS)
Montoliu, C.; Ferrando, N.; Gosálvez, M. A.; Cerdá, J.; Colom, R. J.
2013-10-01
The use of atomistic methods, such as the Continuous Cellular Automaton (CCA), is currently regarded as a computationally efficient and experimentally accurate approach for the simulation of anisotropic etching of various substrates in the manufacture of Micro-electro-mechanical Systems (MEMS). However, when the features of the chemical process are modified, a time-consuming calibration process needs to be used to transform the new macroscopic etch rates into a corresponding set of atomistic rates. Furthermore, changing the substrate requires a labor-intensive effort to reclassify most atomistic neighborhoods. In this context, the Level Set (LS) method provides an alternative approach where the macroscopic forces affecting the front evolution are directly applied at the discrete level, thus avoiding the need for reclassification and/or calibration. Correspondingly, we present a fully-operational Sparse Field Method (SFM) implementation of the LS approach, discussing in detail the algorithm and providing a thorough characterization of the computational cost and simulation accuracy, including a comparison to the performance by the most recent CCA model. We conclude that the SFM implementation achieves similar accuracy as the CCA method with less fluctuations in the etch front and requiring roughly 4 times less memory. Although SFM can be up to 2 times slower than CCA for the simulation of anisotropic etchants, it can also be up to 10 times faster than CCA for isotropic etchants. In addition, we present a parallel, GPU-based implementation (gSFM) and compare it to an optimized, multicore CPU version (cSFM), demonstrating that the SFM algorithm can be successfully parallelized and the simulation times consequently reduced, while keeping the accuracy of the simulations. Although modern multicore CPUs provide an acceptable option, the massively parallel architecture of modern GPUs is more suitable, as reflected by computational times for gSFM up to 7.4 times faster than for cSFM.
An efficient numerical method for solving the Boltzmann equation in multidimensions
NASA Astrophysics Data System (ADS)
Dimarco, Giacomo; Loubère, Raphaël; Narski, Jacek; Rey, Thomas
2018-01-01
In this paper we deal with the extension of the Fast Kinetic Scheme (FKS) (Dimarco and Loubère, 2013 [26]) originally constructed for solving the BGK equation, to the more challenging case of the Boltzmann equation. The scheme combines a robust and fast method for treating the transport part based on an innovative Lagrangian technique supplemented with conservative fast spectral schemes to treat the collisional operator by means of an operator splitting approach. This approach along with several implementation features related to the parallelization of the algorithm permits to construct an efficient simulation tool which is numerically tested against exact and reference solutions on classical problems arising in rarefied gas dynamic. We present results up to the 3 D × 3 D case for unsteady flows for the Variable Hard Sphere model which may serve as benchmark for future comparisons between different numerical methods for solving the multidimensional Boltzmann equation. For this reason, we also provide for each problem studied details on the computational cost and memory consumption as well as comparisons with the BGK model or the limit model of compressible Euler equations.
NASA Technical Reports Server (NTRS)
Giulianetti, Demo J.
2001-01-01
Ground and airborne technologies were developed in the Terminal Area Productivity (TAP) project for increasing throughput at major airports by safely maintaining good-weather operating capacity during bad weather. Methods were demonstrated for accurately predicting vortices to prevent wake-turbulence encounters and to reduce in-trail separation requirements for aircraft approaching the same runway for landing. Technology was demonstrated that safely enabled independent simultaneous approaches in poor weather conditions to parallel runways spaced less than 3,400 ft apart. Guidance, control, and situation-awareness systems were developed to reduce congestion in airport surface operations resulting from the increased throughput, particularly during night and instrument meteorological conditions (IMC). These systems decreased runway occupancy time by safely and smoothly decelerating the aircraft, increasing taxi speed, and safely steering the aircraft off the runway. Simulations were performed in which optimal trajectories were determined by air traffic control (ATC) and communicated to flight crews by means of Center TRACON Automation System/Flight Management System (CTASFMS) automation to reduce flight delays, increase throughput, and ensure flight safety.
High-throughput NGL electron-beam direct-write lithography system
NASA Astrophysics Data System (ADS)
Parker, N. William; Brodie, Alan D.; McCoy, John H.
2000-07-01
Electron beam lithography systems have historically had low throughput. The only practical solution to this limitation is an approach using many beams writing simultaneously. For single-column multi-beam systems, including projection optics (SCALPELR and PREVAIL) and blanked aperture arrays, throughput and resolution are limited by space-charge effects. Multibeam micro-column (one beam per column) systems are limited by the need for low voltage operation, electrical connection density and fabrication complexities. In this paper, we discuss a new multi-beam concept employing multiple columns each with multiple beams to generate a very large total number of parallel writing beams. This overcomes the limitations of space-charge interactions and low voltage operation. We also discuss a rationale leading to the optimum number of columns and beams per column. Using this approach we show how production throughputs >= 60 wafers per hour can be achieved at CDs
Parallelized modelling and solution scheme for hierarchically scaled simulations
NASA Technical Reports Server (NTRS)
Padovan, Joe
1995-01-01
This two-part paper presents the results of a benchmarked analytical-numerical investigation into the operational characteristics of a unified parallel processing strategy for implicit fluid mechanics formulations. This hierarchical poly tree (HPT) strategy is based on multilevel substructural decomposition. The Tree morphology is chosen to minimize memory, communications and computational effort. The methodology is general enough to apply to existing finite difference (FD), finite element (FEM), finite volume (FV) or spectral element (SE) based computer programs without an extensive rewrite of code. In addition to finding large reductions in memory, communications, and computational effort associated with a parallel computing environment, substantial reductions are generated in the sequential mode of application. Such improvements grow with increasing problem size. Along with a theoretical development of general 2-D and 3-D HPT, several techniques for expanding the problem size that the current generation of computers are capable of solving, are presented and discussed. Among these techniques are several interpolative reduction methods. It was found that by combining several of these techniques that a relatively small interpolative reduction resulted in substantial performance gains. Several other unique features/benefits are discussed in this paper. Along with Part 1's theoretical development, Part 2 presents a numerical approach to the HPT along with four prototype CFD applications. These demonstrate the potential of the HPT strategy.
Wake vortex effects on parallel runway operations
DOT National Transportation Integrated Search
2003-01-06
Aircraft wake vortex behavior in ground effect between two parallel runways at Frankfurt/Main International Airport was studied. The distance and time of vortex demise were examined as a function of crosswind, aircraft type, and a measure of atmosphe...
NASA Astrophysics Data System (ADS)
Lashkin, S. V.; Kozelkov, A. S.; Yalozo, A. V.; Gerasimov, V. Yu.; Zelensky, D. K.
2017-12-01
This paper describes the details of the parallel implementation of the SIMPLE algorithm for numerical solution of the Navier-Stokes system of equations on arbitrary unstructured grids. The iteration schemes for the serial and parallel versions of the SIMPLE algorithm are implemented. In the description of the parallel implementation, special attention is paid to computational data exchange among processors under the condition of the grid model decomposition using fictitious cells. We discuss the specific features for the storage of distributed matrices and implementation of vector-matrix operations in parallel mode. It is shown that the proposed way of matrix storage reduces the number of interprocessor exchanges. A series of numerical experiments illustrates the effect of the multigrid SLAE solver tuning on the general efficiency of the algorithm; the tuning involves the types of the cycles used (V, W, and F), the number of iterations of a smoothing operator, and the number of cells for coarsening. Two ways (direct and indirect) of efficiency evaluation for parallelization of the numerical algorithm are demonstrated. The paper presents the results of solving some internal and external flow problems with the evaluation of parallelization efficiency by two algorithms. It is shown that the proposed parallel implementation enables efficient computations for the problems on a thousand processors. Based on the results obtained, some general recommendations are made for the optimal tuning of the multigrid solver, as well as for selecting the optimal number of cells per processor.
Phase Reconstruction from FROG Using Genetic Algorithms[Frequency-Resolved Optical Gating
DOE Office of Scientific and Technical Information (OSTI.GOV)
Omenetto, F.G.; Nicholson, J.W.; Funk, D.J.
1999-04-12
The authors describe a new technique for obtaining the phase and electric field from FROG measurements using genetic algorithms. Frequency-Resolved Optical Gating (FROG) has gained prominence as a technique for characterizing ultrashort pulses. FROG consists of a spectrally resolved autocorrelation of the pulse to be measured. Typically a combination of iterative algorithms is used, applying constraints from experimental data, and alternating between the time and frequency domain, in order to retrieve an optical pulse. The authors have developed a new approach to retrieving the intensity and phase from FROG data using a genetic algorithm (GA). A GA is a generalmore » parallel search technique that operates on a population of potential solutions simultaneously. Operators in a genetic algorithm, such as crossover, selection, and mutation are based on ideas taken from evolution.« less
Numerical modeling for the retrofit of the hydraulic cooling subsystems in operating power plant
NASA Astrophysics Data System (ADS)
AlSaqoor, S.; Alahmer, A.; Al Quran, F.; Andruszkiewicz, A.; Kubas, K.; Regucki, P.; Wędrychowicz, W.
2017-08-01
This paper presents the possibility of using the numerical methods to analyze the work of hydraulic systems on the example of a cooling system of a power boiler auxiliary devices. The variety of conditions at which hydraulic system that operated in specific engineering subsystems requires an individualized approach to the model solutions that have been developed for these systems modernizing. A mathematical model of a series-parallel propagation for the cooling water was derived and iterative methods were used to solve the system of nonlinear equations. The results of numerical calculations made it possible to analyze different variants of a modernization of the studied system and to indicate its critical elements. An economic analysis of different options allows an investor to choose an optimal variant of a reconstruction of the installation.
de Hoz, Livia; Gierej, Dorota; Lioudyno, Victoria; Jaworski, Jacek; Blazejczyk, Magda; Cruces-Solís, Hugo; Beroun, Anna; Lebitko, Tomasz; Nikolaev, Tomasz; Knapska, Ewelina; Nelken, Israel; Kaczmarek, Leszek
2018-05-01
The behavioral changes that comprise operant learning are associated with plasticity in early sensory cortices as well as with modulation of gene expression, but the connection between the behavioral, electrophysiological, and molecular changes is only partially understood. We specifically manipulated c-Fos expression, a hallmark of learning-induced synaptic plasticity, in auditory cortex of adult mice using a novel approach based on RNA interference. Locally blocking c-Fos expression caused a specific behavioral deficit in a sound discrimination task, in parallel with decreased cortical experience-dependent plasticity, without affecting baseline excitability or basic auditory processing. Thus, c-Fos-dependent experience-dependent cortical plasticity is necessary for frequency discrimination in an operant behavioral task. Our results connect behavioral, molecular and physiological changes and demonstrate a role of c-Fos in experience-dependent plasticity and learning.
NASA Astrophysics Data System (ADS)
Yoon, S.
2016-12-01
To define geodetic reference frame using GPS data collected by Continuously Operating Reference Stations (CORS) network, historical GPS data needs to be reprocessed regularly. Reprocessing GPS data collected by upto 2000 CORS sites for the last two decades requires a lot of computational resource. At National Geodetic Survey (NGS), there has been one completed reprocessing in 2011, and currently, the second reprocessing is undergoing. For the first reprocessing effort, in-house computing resource was utilized. In the current second reprocessing effort, outsourced cloud computing platform is being utilized. In this presentation, the outline of data processing strategy at NGS is described as well as the effort to parallelize the data processing procedure in order to maximize the benefit of the cloud computing. The time and cost savings realized by utilizing cloud computing approach will also be discussed.
Secure Network-Centric Aviation Communication (SNAC)
NASA Technical Reports Server (NTRS)
Nelson, Paul H.; Muha, Mark A.; Sheehe, Charles J.
2017-01-01
The existing National Airspace System (NAS) communications capabilities are largely unsecured, are not designed for efficient use of spectrum and collectively are not capable of servicing the future needs of the NAS with the inclusion of new operators in Unmanned Aviation Systems (UAS) or On Demand Mobility (ODM). SNAC will provide a ubiquitous secure, network-based communications architecture that will provide new service capabilities and allow for the migration of current communications to SNAC over time. The necessary change in communication technologies to digital domains will allow for the adoption of security mechanisms, sharing of link technologies, large increase in spectrum utilization, new forms of resilience and redundancy and the possibly of spectrum reuse. SNAC consists of a long term open architectural approach with increasingly capable designs used to steer research and development and enable operating capabilities that run in parallel with current NAS systems.
NASA Technical Reports Server (NTRS)
Lee, L. F.; Cooper, L. P.
1993-01-01
This article describes the approach, results, and lessons learned from an applied research project demonstrating how artificial intelligence (AI) technology can be used to improve Deep Space Network operations. Configuring antenna and associated equipment necessary to support a communications link is a time-consuming process. The time spent configuring the equipment is essentially overhead and results in reduced time for actual mission support operations. The NASA Office of Space Communications (Code O) and the NASA Office of Advanced Concepts and Technology (Code C) jointly funded an applied research project to investigate technologies which can be used to reduce configuration time. This resulted in the development and application of AI-based automated operations technology in a prototype system, the Link Monitor and Control Operator Assistant (LMC OA). The LMC OA was tested over the course of three months in a parallel experimental mode on very long baseline interferometry (VLBI) operations at the Goldstone Deep Space Communications Center. The tests demonstrated a 44 percent reduction in pre-calibration time for a VLBI pass on the 70-m antenna. Currently, this technology is being developed further under Research and Technology Operating Plan (RTOP)-72 to demonstrate the applicability of the technology to operations in the entire Deep Space Network.
Lee, Wei-Po; Hsiao, Yu-Ting; Hwang, Wei-Che
2014-01-16
To improve the tedious task of reconstructing gene networks through testing experimentally the possible interactions between genes, it becomes a trend to adopt the automated reverse engineering procedure instead. Some evolutionary algorithms have been suggested for deriving network parameters. However, to infer large networks by the evolutionary algorithm, it is necessary to address two important issues: premature convergence and high computational cost. To tackle the former problem and to enhance the performance of traditional evolutionary algorithms, it is advisable to use parallel model evolutionary algorithms. To overcome the latter and to speed up the computation, it is advocated to adopt the mechanism of cloud computing as a promising solution: most popular is the method of MapReduce programming model, a fault-tolerant framework to implement parallel algorithms for inferring large gene networks. This work presents a practical framework to infer large gene networks, by developing and parallelizing a hybrid GA-PSO optimization method. Our parallel method is extended to work with the Hadoop MapReduce programming model and is executed in different cloud computing environments. To evaluate the proposed approach, we use a well-known open-source software GeneNetWeaver to create several yeast S. cerevisiae sub-networks and use them to produce gene profiles. Experiments have been conducted and the results have been analyzed. They show that our parallel approach can be successfully used to infer networks with desired behaviors and the computation time can be largely reduced. Parallel population-based algorithms can effectively determine network parameters and they perform better than the widely-used sequential algorithms in gene network inference. These parallel algorithms can be distributed to the cloud computing environment to speed up the computation. By coupling the parallel model population-based optimization method and the parallel computational framework, high quality solutions can be obtained within relatively short time. This integrated approach is a promising way for inferring large networks.
2014-01-01
Background To improve the tedious task of reconstructing gene networks through testing experimentally the possible interactions between genes, it becomes a trend to adopt the automated reverse engineering procedure instead. Some evolutionary algorithms have been suggested for deriving network parameters. However, to infer large networks by the evolutionary algorithm, it is necessary to address two important issues: premature convergence and high computational cost. To tackle the former problem and to enhance the performance of traditional evolutionary algorithms, it is advisable to use parallel model evolutionary algorithms. To overcome the latter and to speed up the computation, it is advocated to adopt the mechanism of cloud computing as a promising solution: most popular is the method of MapReduce programming model, a fault-tolerant framework to implement parallel algorithms for inferring large gene networks. Results This work presents a practical framework to infer large gene networks, by developing and parallelizing a hybrid GA-PSO optimization method. Our parallel method is extended to work with the Hadoop MapReduce programming model and is executed in different cloud computing environments. To evaluate the proposed approach, we use a well-known open-source software GeneNetWeaver to create several yeast S. cerevisiae sub-networks and use them to produce gene profiles. Experiments have been conducted and the results have been analyzed. They show that our parallel approach can be successfully used to infer networks with desired behaviors and the computation time can be largely reduced. Conclusions Parallel population-based algorithms can effectively determine network parameters and they perform better than the widely-used sequential algorithms in gene network inference. These parallel algorithms can be distributed to the cloud computing environment to speed up the computation. By coupling the parallel model population-based optimization method and the parallel computational framework, high quality solutions can be obtained within relatively short time. This integrated approach is a promising way for inferring large networks. PMID:24428926
Design considerations for parallel graphics libraries
NASA Technical Reports Server (NTRS)
Crockett, Thomas W.
1994-01-01
Applications which run on parallel supercomputers are often characterized by massive datasets. Converting these vast collections of numbers to visual form has proven to be a powerful aid to comprehension. For a variety of reasons, it may be desirable to provide this visual feedback at runtime. One way to accomplish this is to exploit the available parallelism to perform graphics operations in place. In order to do this, we need appropriate parallel rendering algorithms and library interfaces. This paper provides a tutorial introduction to some of the issues which arise in designing parallel graphics libraries and their underlying rendering algorithms. The focus is on polygon rendering for distributed memory message-passing systems. We illustrate our discussion with examples from PGL, a parallel graphics library which has been developed on the Intel family of parallel systems.
National Centers for Environmental Prediction
/ VISION | About EMC EMC > NAM > Home NAM Operational Products HIRESW Operational Products Operational Forecast Graphics Experimental Forecast Graphics Verification and Diagnostics Model Configuration Collaborators Documentation and Code FAQ Operational Change Log Parallel Experiment Change Log Contacts
Terminal Area Procedures for Paired Runways
NASA Technical Reports Server (NTRS)
Lozito, Sandy
2011-01-01
Parallel Runway operations have been found to increase capacity within the National Airspace (NAS) however, poor visibility conditions reduce this capacity [1]. Much research has been conducted to examine the concepts and procedures related to parallel runways however, there has been no investigation of the procedures associated with the strategic and tactical pairing of aircraft for these operations. This study developed and examined the pilot and controller procedures and information requirements for creating aircraft pairs for parallel runway operations. The goal was to achieve aircraft pairing with a temporal separation of 15s(+/- 10s error) at a coupling point that is about 12 nmi from the runway threshold. Two variables were explored for the pilot participants: Two levels of flight deck automation (current-day flight deck automation, and a prototype future automation) as well as two flight deck displays that assisted in pilot conformance monitoring. The controllers were also provided with automation to help create and maintain aircraft pairs. Data showed that the operations in this study were found to be acceptable and safe. Workload when using the pairing procedures and tools was generally low for both controllers and pilots, and situation awareness (SA) was typically moderate to high. There were some differences based upon the display and automation conditions for the pilots. Future research should consider the refinement of the concepts and tools for pilot and controller displays and automation for parallel runway concepts.
NASA Astrophysics Data System (ADS)
Wanguang, Sun; Chengzhen, Li; Baoshan, Fan
2018-06-01
Rivers are drying up most frequently in West Liaohe River plain and the bare river beds present fine sand belts on land. These sand belts, which yield a dust heavily in windy days, stress the local environment deeply as the riverbeds are eroded by wind. The optimal operation of water resources, thus, is one of the most important methods for preventing the wind erosion of riverbeds. In this paper, optimal operation model for water resources based on riverbed wind erosion control has been established, which contains objective function, constraints, and solution method. The objective function considers factors which include water volume diverted into reservoirs, river length and lower threshold of flow rate, etc. On the basis of ensuring the water requirement of each reservoir, the destruction of the vegetation in the riverbed by the frequent river flow is avoided. The multi core parallel solving method for optimal water resources operation in the West Liaohe River Plain is proposed, which the optimal solution is found by DPSA method under the POA framework and the parallel computing program is designed in Fork/Join mode. Based on the optimal operation results, the basic rules of water resources operation in the West Liaohe River Plain are summarized. Calculation results show that, on the basis of meeting the requirement of water volume of every reservoir, the frequency of reach river flow which from Taihekou to Talagan Water Diversion Project in the Xinkai River is reduced effectively. The speedup and parallel efficiency of parallel algorithm are 1.51 and 0.76 respectively, and the computing time is significantly decreased. The research results show in this paper can provide technical support for the prevention and control of riverbed wind erosion in the West Liaohe River plain.
36 CFR Appendix D to Part 1191 - Technical
Code of Federal Regulations, 2014 CFR
2014-07-01
... inch (13 mm) high shall be ramped, and shall comply with 405 or 406. 304Turning Space 304.1General... ground space allows a parallel approach to an element and the side reach is unobstructed, the high side....2Obstructed High Reach. Where a clear floor or ground space allows a parallel approach to an element and the...
On extending parallelism to serial simulators
NASA Technical Reports Server (NTRS)
Nicol, David; Heidelberger, Philip
1994-01-01
This paper describes an approach to discrete event simulation modeling that appears to be effective for developing portable and efficient parallel execution of models of large distributed systems and communication networks. In this approach, the modeler develops submodels using an existing sequential simulation modeling tool, using the full expressive power of the tool. A set of modeling language extensions permit automatically synchronized communication between submodels; however, the automation requires that any such communication must take a nonzero amount off simulation time. Within this modeling paradigm, a variety of conservative synchronization protocols can transparently support conservative execution of submodels on potentially different processors. A specific implementation of this approach, U.P.S. (Utilitarian Parallel Simulator), is described, along with performance results on the Intel Paragon.
Stage-by-Stage and Parallel Flow Path Compressor Modeling for a Variable Cycle Engine
NASA Technical Reports Server (NTRS)
Kopasakis, George; Connolly, Joseph W.; Cheng, Larry
2015-01-01
This paper covers the development of stage-by-stage and parallel flow path compressor modeling approaches for a Variable Cycle Engine. The stage-by-stage compressor modeling approach is an extension of a technique for lumped volume dynamics and performance characteristic modeling. It was developed to improve the accuracy of axial compressor dynamics over lumped volume dynamics modeling. The stage-by-stage compressor model presented here is formulated into a parallel flow path model that includes both axial and rotational dynamics. This is done to enable the study of compressor and propulsion system dynamic performance under flow distortion conditions. The approaches utilized here are generic and should be applicable for the modeling of any axial flow compressor design.
How to Build an AppleSeed: A Parallel Macintosh Cluster for Numerically Intensive Computing
NASA Astrophysics Data System (ADS)
Decyk, V. K.; Dauger, D. E.
We have constructed a parallel cluster consisting of a mixture of Apple Macintosh G3 and G4 computers running the Mac OS, and have achieved very good performance on numerically intensive, parallel plasma particle-incell simulations. A subset of the MPI message-passing library was implemented in Fortran77 and C. This library enabled us to port code, without modification, from other parallel processors to the Macintosh cluster. Unlike Unix-based clusters, no special expertise in operating systems is required to build and run the cluster. This enables us to move parallel computing from the realm of experts to the main stream of computing.
Distributed communications and control network for robotic mining
NASA Technical Reports Server (NTRS)
Schiffbauer, William H.
1989-01-01
The application of robotics to coal mining machines is one approach pursued to increase productivity while providing enhanced safety for the coal miner. Toward that end, a network composed of microcontrollers, computers, expert systems, real time operating systems, and a variety of program languages are being integrated that will act as the backbone for intelligent machine operation. Actual mining machines, including a few customized ones, have been given telerobotic semiautonomous capabilities by applying the described network. Control devices, intelligent sensors and computers onboard these machines are showing promise of achieving improved mining productivity and safety benefits. Current research using these machines involves navigation, multiple machine interaction, machine diagnostics, mineral detection, and graphical machine representation. Guidance sensors and systems employed include: sonar, laser rangers, gyroscopes, magnetometers, clinometers, and accelerometers. Information on the network of hardware/software and its implementation on mining machines are presented. Anticipated coal production operations using the network are discussed. A parallelism is also drawn between the direction of present day underground coal mining research to how the lunar soil (regolith) may be mined. A conceptual lunar mining operation that employs a distributed communication and control network is detailed.
Lu, Zhao; Sun, Jing; Butts, Kenneth
2016-02-03
A giant leap has been made in the past couple of decades with the introduction of kernel-based learning as a mainstay for designing effective nonlinear computational learning algorithms. In view of the geometric interpretation of conditional expectation and the ubiquity of multiscale characteristics in highly complex nonlinear dynamic systems [1]-[3], this paper presents a new orthogonal projection operator wavelet kernel, aiming at developing an efficient computational learning approach for nonlinear dynamical system identification. In the framework of multiresolution analysis, the proposed projection operator wavelet kernel can fulfill the multiscale, multidimensional learning to estimate complex dependencies. The special advantage of the projection operator wavelet kernel developed in this paper lies in the fact that it has a closed-form expression, which greatly facilitates its application in kernel learning. To the best of our knowledge, it is the first closed-form orthogonal projection wavelet kernel reported in the literature. It provides a link between grid-based wavelets and mesh-free kernel-based methods. Simulation studies for identifying the parallel models of two benchmark nonlinear dynamical systems confirm its superiority in model accuracy and sparsity.
Craciun, Stefan; Brockmeier, Austin J; George, Alan D; Lam, Herman; Príncipe, José C
2011-01-01
Methods for decoding movements from neural spike counts using adaptive filters often rely on minimizing the mean-squared error. However, for non-Gaussian distribution of errors, this approach is not optimal for performance. Therefore, rather than using probabilistic modeling, we propose an alternate non-parametric approach. In order to extract more structure from the input signal (neuronal spike counts) we propose using minimum error entropy (MEE), an information-theoretic approach that minimizes the error entropy as part of an iterative cost function. However, the disadvantage of using MEE as the cost function for adaptive filters is the increase in computational complexity. In this paper we present a comparison between the decoding performance of the analytic Wiener filter and a linear filter trained with MEE, which is then mapped to a parallel architecture in reconfigurable hardware tailored to the computational needs of the MEE filter. We observe considerable speedup from the hardware design. The adaptation of filter weights for the multiple-input, multiple-output linear filters, necessary in motor decoding, is a highly parallelizable algorithm. It can be decomposed into many independent computational blocks with a parallel architecture readily mapped to a field-programmable gate array (FPGA) and scales to large numbers of neurons. By pipelining and parallelizing independent computations in the algorithm, the proposed parallel architecture has sublinear increases in execution time with respect to both window size and filter order.
NASA Astrophysics Data System (ADS)
Zatarain Salazar, Jazmin; Reed, Patrick M.; Quinn, Julianne D.; Giuliani, Matteo; Castelletti, Andrea
2017-11-01
Reservoir operations are central to our ability to manage river basin systems serving conflicting multi-sectoral demands under increasingly uncertain futures. These challenges motivate the need for new solution strategies capable of effectively and efficiently discovering the multi-sectoral tradeoffs that are inherent to alternative reservoir operation policies. Evolutionary many-objective direct policy search (EMODPS) is gaining importance in this context due to its capability of addressing multiple objectives and its flexibility in incorporating multiple sources of uncertainties. This simulation-optimization framework has high potential for addressing the complexities of water resources management, and it can benefit from current advances in parallel computing and meta-heuristics. This study contributes a diagnostic assessment of state-of-the-art parallel strategies for the auto-adaptive Borg Multi Objective Evolutionary Algorithm (MOEA) to support EMODPS. Our analysis focuses on the Lower Susquehanna River Basin (LSRB) system where multiple sectoral demands from hydropower production, urban water supply, recreation and environmental flows need to be balanced. Using EMODPS with different parallel configurations of the Borg MOEA, we optimize operating policies over different size ensembles of synthetic streamflows and evaporation rates. As we increase the ensemble size, we increase the statistical fidelity of our objective function evaluations at the cost of higher computational demands. This study demonstrates how to overcome the mathematical and computational barriers associated with capturing uncertainties in stochastic multiobjective reservoir control optimization, where parallel algorithmic search serves to reduce the wall-clock time in discovering high quality representations of key operational tradeoffs. Our results show that emerging self-adaptive parallelization schemes exploiting cooperative search populations are crucial. Such strategies provide a promising new set of tools for effectively balancing exploration, uncertainty, and computational demands when using EMODPS.
Vortex-Free Flight Corridors for Aircraft Executing Compressed Landing Operations
NASA Technical Reports Server (NTRS)
Rossow, Vernon J.
2006-01-01
A factor that limits airport arrival and departure rates is the need to wait between operations for the wake vortices of preceding aircraft to decay to a safe level. As airport traffic demand increases, creative methods will be needed to overcome the limitations caused by the hazard posed by vortex wakes so that airport capacities can be increased. The problem addressed here is the design of vortex-free trajectories for aircraft as they fly from their cruise altitudes down to their final approach paths and to a landing. The guidelines presented recommend that the flight path of each aircraft in a group executing nearly-simultaneous landings be spaced far enough apart laterally along organized flight paths so that the vortex wakes of preceding aircraft will not intrude into the airspace to be used by following aircraft. An example is presented as to how a combination of straight lines and circular arcs is able to provide each aircraft in a group with a vortex-free trajectory so that all are able to safely form the pattern needed for nearly simultaneous landings on a set of closely-spaced parallel runways. Although the guidelines me described for aircraft on approach, they are also applicable to departure, and to en route operations.
A tesselated probabilistic representation for spatial robot perception and navigation
NASA Technical Reports Server (NTRS)
Elfes, Alberto
1989-01-01
The ability to recover robust spatial descriptions from sensory information and to efficiently utilize these descriptions in appropriate planning and problem-solving activities are crucial requirements for the development of more powerful robotic systems. Traditional approaches to sensor interpretation, with their emphasis on geometric models, are of limited use for autonomous mobile robots operating in and exploring unknown and unstructured environments. Here, researchers present a new approach to robot perception that addresses such scenarios using a probabilistic tesselated representation of spatial information called the Occupancy Grid. The Occupancy Grid is a multi-dimensional random field that maintains stochastic estimates of the occupancy state of each cell in the grid. The cell estimates are obtained by interpreting incoming range readings using probabilistic models that capture the uncertainty in the spatial information provided by the sensor. A Bayesian estimation procedure allows the incremental updating of the map using readings taken from several sensors over multiple points of view. An overview of the Occupancy Grid framework is given, and its application to a number of problems in mobile robot mapping and navigation are illustrated. It is argued that a number of robotic problem-solving activities can be performed directly on the Occupancy Grid representation. Some parallels are drawn between operations on Occupancy Grids and related image processing operations.
Hadoop for High-Performance Climate Analytics: Use Cases and Lessons Learned
NASA Technical Reports Server (NTRS)
Tamkin, Glenn
2013-01-01
Scientific data services are a critical aspect of the NASA Center for Climate Simulations mission (NCCS). Hadoop, via MapReduce, provides an approach to high-performance analytics that is proving to be useful to data intensive problems in climate research. It offers an analysis paradigm that uses clusters of computers and combines distributed storage of large data sets with parallel computation. The NCCS is particularly interested in the potential of Hadoop to speed up basic operations common to a wide range of analyses. In order to evaluate this potential, we prototyped a series of canonical MapReduce operations over a test suite of observational and climate simulation datasets. The initial focus was on averaging operations over arbitrary spatial and temporal extents within Modern Era Retrospective- Analysis for Research and Applications (MERRA) data. After preliminary results suggested that this approach improves efficiencies within data intensive analytic workflows, we invested in building a cyber infrastructure resource for developing a new generation of climate data analysis capabilities using Hadoop. This resource is focused on reducing the time spent in the preparation of reanalysis data used in data-model inter-comparison, a long sought goal of the climate community. This paper summarizes the related use cases and lessons learned.
ERIC Educational Resources Information Center
Green, Samuel B.; Levy, Roy; Thompson, Marilyn S.; Lu, Min; Lo, Wen-Juo
2012-01-01
A number of psychometricians have argued for the use of parallel analysis to determine the number of factors. However, parallel analysis must be viewed at best as a heuristic approach rather than a mathematically rigorous one. The authors suggest a revision to parallel analysis that could improve its accuracy. A Monte Carlo study is conducted to…
NASA Astrophysics Data System (ADS)
Cai, Yong; Cui, Xiangyang; Li, Guangyao; Liu, Wenyang
2018-04-01
The edge-smooth finite element method (ES-FEM) can improve the computational accuracy of triangular shell elements and the mesh partition efficiency of complex models. In this paper, an approach is developed to perform explicit finite element simulations of contact-impact problems with a graphical processing unit (GPU) using a special edge-smooth triangular shell element based on ES-FEM. Of critical importance for this problem is achieving finer-grained parallelism to enable efficient data loading and to minimize communication between the device and host. Four kinds of parallel strategies are then developed to efficiently solve these ES-FEM based shell element formulas, and various optimization methods are adopted to ensure aligned memory access. Special focus is dedicated to developing an approach for the parallel construction of edge systems. A parallel hierarchy-territory contact-searching algorithm (HITA) and a parallel penalty function calculation method are embedded in this parallel explicit algorithm. Finally, the program flow is well designed, and a GPU-based simulation system is developed, using Nvidia's CUDA. Several numerical examples are presented to illustrate the high quality of the results obtained with the proposed methods. In addition, the GPU-based parallel computation is shown to significantly reduce the computing time.
Murphy, Mark; Alley, Marcus; Demmel, James; Keutzer, Kurt; Vasanawala, Shreyas; Lustig, Michael
2012-06-01
We present l₁-SPIRiT, a simple algorithm for auto calibrating parallel imaging (acPI) and compressed sensing (CS) that permits an efficient implementation with clinically-feasible runtimes. We propose a CS objective function that minimizes cross-channel joint sparsity in the wavelet domain. Our reconstruction minimizes this objective via iterative soft-thresholding, and integrates naturally with iterative self-consistent parallel imaging (SPIRiT). Like many iterative magnetic resonance imaging reconstructions, l₁-SPIRiT's image quality comes at a high computational cost. Excessively long runtimes are a barrier to the clinical use of any reconstruction approach, and thus we discuss our approach to efficiently parallelizing l₁-SPIRiT and to achieving clinically-feasible runtimes. We present parallelizations of l₁-SPIRiT for both multi-GPU systems and multi-core CPUs, and discuss the software optimization and parallelization decisions made in our implementation. The performance of these alternatives depends on the processor architecture, the size of the image matrix, and the number of parallel imaging channels. Fundamentally, achieving fast runtime requires the correct trade-off between cache usage and parallelization overheads. We demonstrate image quality via a case from our clinical experimentation, using a custom 3DFT spoiled gradient echo (SPGR) sequence with up to 8× acceleration via Poisson-disc undersampling in the two phase-encoded directions.
Murphy, Mark; Alley, Marcus; Demmel, James; Keutzer, Kurt; Vasanawala, Shreyas; Lustig, Michael
2012-01-01
We present ℓ1-SPIRiT, a simple algorithm for auto calibrating parallel imaging (acPI) and compressed sensing (CS) that permits an efficient implementation with clinically-feasible runtimes. We propose a CS objective function that minimizes cross-channel joint sparsity in the Wavelet domain. Our reconstruction minimizes this objective via iterative soft-thresholding, and integrates naturally with iterative Self-Consistent Parallel Imaging (SPIRiT). Like many iterative MRI reconstructions, ℓ1-SPIRiT’s image quality comes at a high computational cost. Excessively long runtimes are a barrier to the clinical use of any reconstruction approach, and thus we discuss our approach to efficiently parallelizing ℓ1-SPIRiT and to achieving clinically-feasible runtimes. We present parallelizations of ℓ1-SPIRiT for both multi-GPU systems and multi-core CPUs, and discuss the software optimization and parallelization decisions made in our implementation. The performance of these alternatives depends on the processor architecture, the size of the image matrix, and the number of parallel imaging channels. Fundamentally, achieving fast runtime requires the correct trade-off between cache usage and parallelization overheads. We demonstrate image quality via a case from our clinical experimentation, using a custom 3DFT Spoiled Gradient Echo (SPGR) sequence with up to 8× acceleration via poisson-disc undersampling in the two phase-encoded directions. PMID:22345529
NASA Astrophysics Data System (ADS)
Decyk, Viktor K.; Dauger, Dean E.
We have constructed a parallel cluster consisting of Apple Macintosh G4 computers running both Classic Mac OS as well as the Unix-based Mac OS X, and have achieved very good performance on numerically intensive, parallel plasma particle-in-cell simulations. Unlike other Unix-based clusters, no special expertise in operating systems is required to build and run the cluster. This enables us to move parallel computing from the realm of experts to the mainstream of computing.
Research on Parallel Three Phase PWM Converters base on RTDS
NASA Astrophysics Data System (ADS)
Xia, Yan; Zou, Jianxiao; Li, Kai; Liu, Jingbo; Tian, Jun
2018-01-01
Converters parallel operation can increase capacity of the system, but it may lead to potential zero-sequence circulating current, so the control of circulating current was an important goal in the design of parallel inverters. In this paper, the Real Time Digital Simulator (RTDS) is used to model the converters parallel system in real time and study the circulating current restraining. The equivalent model of two parallel converters and zero-sequence circulating current(ZSCC) were established and analyzed, then a strategy using variable zero vector control was proposed to suppress the circulating current. For two parallel modular converters, hardware-in-the-loop(HIL) study based on RTDS and practical experiment were implemented, results prove that the proposed control strategy is feasible and effective.
Ropes: Support for collective opertions among distributed threads
NASA Technical Reports Server (NTRS)
Haines, Matthew; Mehrotra, Piyush; Cronk, David
1995-01-01
Lightweight threads are becoming increasingly useful in supporting parallelism and asynchronous control structures in applications and language implementations. Recently, systems have been designed and implemented to support interprocessor communication between lightweight threads so that threads can be exploited in a distributed memory system. Their use, in this setting, has been largely restricted to supporting latency hiding techniques and functional parallelism within a single application. However, to execute data parallel codes independent of other threads in the system, collective operations and relative indexing among threads are required. This paper describes the design of ropes: a scoping mechanism for collective operations and relative indexing among threads. We present the design of ropes in the context of the Chant system, and provide performance results evaluating our initial design decisions.
Resonance-induced sensitivity enhancement method for conductivity sensors
NASA Technical Reports Server (NTRS)
Tai, Yu-Chong (Inventor); Shih, Chi-yuan (Inventor); Li, Wei (Inventor); Zheng, Siyang (Inventor)
2009-01-01
Methods and systems for improving the sensitivity of a variety of conductivity sensing devices, in particular capacitively-coupled contactless conductivity detectors. A parallel inductor is added to the conductivity sensor. The sensor with the parallel inductor is operated at a resonant frequency of the equivalent circuit model. At the resonant frequency, parasitic capacitances that are either in series or in parallel with the conductance (and possibly a series resistance) is substantially removed from the equivalent circuit, leaving a purely resistive impedance. An appreciably higher sensor sensitivity results. Experimental verification shows that sensitivity improvements of the order of 10,000-fold are possible. Examples of detecting particulates with high precision by application of the apparatus and methods of operation are described.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Berry, K.R.; Hansen, F.R.; Napolitano, L.M.
1992-01-01
DART (DSP Arrary for Reconfigurable Tasks) is a parallel architecture of two high-performance SDP (digital signal processing) chips with the flexibility to handle a wide range of real-time applications. Each of the 32-bit floating-point DSP processes in DART is programmable in a high-level languate ( C'' or Ada). We have added extensions to the real-time operating system used by DART in order to support parallel processor. The combination of high-level language programmability, a real-time operating system, and parallel processing support significantly reduces the development cost of application software for signal processing and control applications. We have demonstrated this capability bymore » using DART to reconstruct images in the prototype VIP (Video Imaging Projectile) groundstation.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Berry, K.R.; Hansen, F.R.; Napolitano, L.M.
1992-01-01
DART (DSP Arrary for Reconfigurable Tasks) is a parallel architecture of two high-performance SDP (digital signal processing) chips with the flexibility to handle a wide range of real-time applications. Each of the 32-bit floating-point DSP processes in DART is programmable in a high-level languate (``C`` or Ada). We have added extensions to the real-time operating system used by DART in order to support parallel processor. The combination of high-level language programmability, a real-time operating system, and parallel processing support significantly reduces the development cost of application software for signal processing and control applications. We have demonstrated this capability by usingmore » DART to reconstruct images in the prototype VIP (Video Imaging Projectile) groundstation.« less
Image Processing Using a Parallel Architecture.
1987-12-01
ENG/87D-25 Abstract This study developed a set o± low level image processing tools on a parallel computer that allows concurrent processing of images...environment, the set of tools offers a significant reduction in the time required to perform some commonly used image processing operations. vI IMAGE...step toward developing these systems, a structured set of image processing tools was implemented using a parallel computer. More important than
Multi-threading: A new dimension to massively parallel scientific computation
NASA Astrophysics Data System (ADS)
Nielsen, Ida M. B.; Janssen, Curtis L.
2000-06-01
Multi-threading is becoming widely available for Unix-like operating systems, and the application of multi-threading opens new ways for performing parallel computations with greater efficiency. We here briefly discuss the principles of multi-threading and illustrate the application of multi-threading for a massively parallel direct four-index transformation of electron repulsion integrals. Finally, other potential applications of multi-threading in scientific computing are outlined.
NASA Astrophysics Data System (ADS)
Guo, L.; Huang, H.; Gaston, D.; Redden, G. D.; Fox, D. T.; Fujita, Y.
2010-12-01
Inducing mineral precipitation in the subsurface is one potential strategy for immobilizing trace metal and radionuclide contaminants. Generating mineral precipitates in situ can be achieved by manipulating chemical conditions, typically through injection or in situ generation of reactants. How these reactants transport, mix and react within the medium controls the spatial distribution and composition of the resulting mineral phases. Multiple processes, including fluid flow, dispersive/diffusive transport of reactants, biogeochemical reactions and changes in porosity-permeability, are tightly coupled over a number of scales. Numerical modeling can be used to investigate the nonlinear coupling effects of these processes which are quite challenging to explore experimentally. Many subsurface reactive transport simulators employ a de-coupled or operator-splitting approach where transport equations and batch chemistry reactions are solved sequentially. However, such an approach has limited applicability for biogeochemical systems with fast kinetics and strong coupling between chemical reactions and medium properties. A massively parallel, fully coupled, fully implicit Reactive Transport simulator (referred to as “RAT”) based on a parallel multi-physics object-oriented simulation framework (MOOSE) has been developed at the Idaho National Laboratory. Within this simulator, systems of transport and reaction equations can be solved simultaneously in a fully coupled, fully implicit manner using the Jacobian Free Newton-Krylov (JFNK) method with additional advanced computing capabilities such as (1) physics-based preconditioning for solution convergence acceleration, (2) massively parallel computing and scalability, and (3) adaptive mesh refinements for 2D and 3D structured and unstructured mesh. The simulator was first tested against analytical solutions, then applied to simulating induced calcium carbonate mineral precipitation in 1D columns and 2D flow cells as analogs to homogeneous and heterogeneous porous media, respectively. In 1D columns, calcium carbonate mineral precipitation was driven by urea hydrolysis catalyzed by urease enzyme, and in 2D flow cells, calcium carbonate mineral forming reactants were injected sequentially, forming migrating reaction fronts that are typically highly nonuniform. The RAT simulation results for the spatial and temporal distributions of precipitates, reaction rates and major species in the system, and also for changes in porosity and permeability, were compared to both laboratory experimental data and computational results obtained using other reactive transport simulators. The comparisons demonstrate the ability of RAT to simulate complex nonlinear systems and the advantages of fully coupled approaches, over de-coupled methods, for accurate simulation of complex, dynamic processes such as engineered mineral precipitation in subsurface environments.
Archer, Charles J.; Inglett, Todd A.; Ratterman, Joseph D.; Smith, Brian E.
2010-03-02
Methods, apparatus, and products are disclosed for configuring compute nodes of a parallel computer in an operational group into a plurality of independent non-overlapping collective networks, the compute nodes in the operational group connected together for data communications through a global combining network, that include: partitioning the compute nodes in the operational group into a plurality of non-overlapping subgroups; designating one compute node from each of the non-overlapping subgroups as a master node; and assigning, to the compute nodes in each of the non-overlapping subgroups, class routing instructions that organize the compute nodes in that non-overlapping subgroup as a collective network such that the master node is a physical root.
The Charlotte (TM) intra-vehicular robot
NASA Technical Reports Server (NTRS)
Swaim, Patrick L.; Thompson, Clark J.; Campbell, Perry D.
1994-01-01
NASA has identified telerobotics and telescience as essential technologies to reduce the crew extra-vehicular activity (EVA) and intra-vehicular activity (IVA) workloads. Under this project, we are developing and flight testing a novel IVA robot to relieve the crew of tedious and routine tasks. Through ground telerobotic control of this robot, we will enable ground researchers to routinely interact with experiments in space. Our approach is to develop an IVA robot system incrementally by employing a series of flight tests with increasing complexity. This approach has the advantages of providing an early IVA capability that can assist the crew, demonstrate capabilities that ground researchers can be confident of in planning for future experiments, and allow incremental refinement of system capabilities and insertion of new technology. In parallel with this approach to flight testing, we seek to establish ground test beds, in which the requirements of payload experimenters can be further investigated. In 1993 we reviewed manifested SpaceHab experiments and defined IVA robot requirements to assist in their operation. We also examined previous IVA robot designs and assessed them against flight requirements. We rejected previous design concepts on the basis of threat to crew safety, operability, and maintainability. Based on this insight, we developed an entirely new concept for IVA robotics, the CHARLOTTE robot system. Ground based testing of a prototype version of the system has already proven its ability to perform most common tasks demanded of the crew, including operation of switches, buttons, knobs, dials, and performing video surveys of experiments and switch panels.
Distributed intelligence for supervisory control
NASA Technical Reports Server (NTRS)
Wolfe, W. J.; Raney, S. D.
1987-01-01
Supervisory control systems must deal with various types of intelligence distributed throughout the layers of control. Typical layers are real-time servo control, off-line planning and reasoning subsystems and finally, the human operator. Design methodologies must account for the fact that the majority of the intelligence will reside with the human operator. Hierarchical decompositions and feedback loops as conceptual building blocks that provide a common ground for man-machine interaction are discussed. Examples of types of parallelism and parallel implementation on several classes of computer architecture are also discussed.
Linux Kernel Co-Scheduling For Bulk Synchronous Parallel Applications
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jones, Terry R
2011-01-01
This paper describes a kernel scheduling algorithm that is based on co-scheduling principles and that is intended for parallel applications running on 1000 cores or more where inter-node scalability is key. Experimental results for a Linux implementation on a Cray XT5 machine are presented.1 The results indicate that Linux is a suitable operating system for this new scheduling scheme, and that this design provides a dramatic improvement in scaling performance for synchronizing collective operations at scale.
Analysis of foliage effects on mobile propagation in dense urban environments
NASA Astrophysics Data System (ADS)
Bronshtein, Alexander; Mazar, Reuven; Lu, I.-Tai
2000-07-01
Attempts to reduce the interference level and to increase the spectral efficiency of cellular radio communication systems operating in dense urban and suburban areas lead to the microcellular approach with a consequent requirement to lower antenna heights. In large metropolitan areas having high buildings this requirement causes a situation where the transmitting and receiving antennas are both located below the rooftops, and the city street acts as a type of a waveguiding channel for the propagating signal. In this work, the city street is modeled as a random multislit waveguide with randomly distributed regions of foliage parallel to the building boundaries. The statistical propagation characteristics are expressed in terms of multiple ray-fields approaching the observer. Algorithms for predicting the path-loss along the waveguide and for computing the transverse field structure are presented.
Agents, Bayes, and Climatic Risks - a modular modelling approach
NASA Astrophysics Data System (ADS)
Haas, A.; Jaeger, C.
2005-08-01
When insurance firms, energy companies, governments, NGOs, and other agents strive to manage climatic risks, it is by no way clear what the aggregate outcome should and will be. As a framework for investigating this subject, we present the LAGOM model family. It is based on modules depicting learning social agents. For managing climate risks, our agents use second order probabilities and update them by means of a Bayesian mechanism while differing in priors and risk aversion. The interactions between these modules and the aggregate outcomes of their actions are implemented using further modules. The software system is implemented as a series of parallel processes using the CIAMn approach. It is possible to couple modules irrespective of the language they are written in, the operating system under which they are run, and the physical location of the machine.
Global Detection of Live Virtual Machine Migration Based on Cellular Neural Networks
Xie, Kang; Yang, Yixian; Zhang, Ling; Jing, Maohua; Xin, Yang; Li, Zhongxian
2014-01-01
In order to meet the demands of operation monitoring of large scale, autoscaling, and heterogeneous virtual resources in the existing cloud computing, a new method of live virtual machine (VM) migration detection algorithm based on the cellular neural networks (CNNs), is presented. Through analyzing the detection process, the parameter relationship of CNN is mapped as an optimization problem, in which improved particle swarm optimization algorithm based on bubble sort is used to solve the problem. Experimental results demonstrate that the proposed method can display the VM migration processing intuitively. Compared with the best fit heuristic algorithm, this approach reduces the processing time, and emerging evidence has indicated that this new approach is affordable to parallelism and analog very large scale integration (VLSI) implementation allowing the VM migration detection to be performed better. PMID:24959631
Global detection of live virtual machine migration based on cellular neural networks.
Xie, Kang; Yang, Yixian; Zhang, Ling; Jing, Maohua; Xin, Yang; Li, Zhongxian
2014-01-01
In order to meet the demands of operation monitoring of large scale, autoscaling, and heterogeneous virtual resources in the existing cloud computing, a new method of live virtual machine (VM) migration detection algorithm based on the cellular neural networks (CNNs), is presented. Through analyzing the detection process, the parameter relationship of CNN is mapped as an optimization problem, in which improved particle swarm optimization algorithm based on bubble sort is used to solve the problem. Experimental results demonstrate that the proposed method can display the VM migration processing intuitively. Compared with the best fit heuristic algorithm, this approach reduces the processing time, and emerging evidence has indicated that this new approach is affordable to parallelism and analog very large scale integration (VLSI) implementation allowing the VM migration detection to be performed better.
Dual-mode plasmonic nanorod type antenna based on the concept of a trapped dipole.
Panaretos, Anastasios H; Werner, Douglas H
2015-04-06
In this paper we theoretically investigate the feasibility of creating a dual-mode plasmonic nanorod antenna. The proposed design methodology relies on adapting to optical wavelengths the principles of operation of trapped dipole antennas, which have been widely used in the low MHz frequency range. This type of antenna typically employs parallel LC circuits, also referred to as "traps", which are connected along the two arms of the dipole. By judiciously choosing the resonant frequency of these traps, as well as their position along the arms of the dipole, it is feasible to excite the λ/2 resonance of both the original dipole as well as the shorter section defined by the length of wire between the two traps. This effectively enables the dipole antenna to have a dual-mode of operation. Our analysis reveals that the implementation of this concept at the nanoscale requires that two cylindrical pockets (i.e. loading volumes) be introduced along the length of the nanoantenna, inside which plasmonic core-shell particles are embedded. By properly selecting the geometry and constitution of the core-shell particle as well as the constitution of the host material of the two loading volumes and their position along the nanorod, the equivalent effect of a resonant parallel LC circuit can be realized. This effectively enables a dual-mode operation of the nanorod antenna. The proposed methodology introduces a compact approach for the realization of dual-mode optical sensors while at the same time it clearly illustrates the inherent tuning capabilities that core-shell particles can offer in a practical framework.
The Simplified Aircraft-Based Paired Approach With the ALAS Alerting Algorithm
NASA Technical Reports Server (NTRS)
Perry, Raleigh B.; Madden, Michael M.; Torres-Pomales, Wilfredo; Butler, Ricky W.
2013-01-01
This paper presents the results of an investigation of a proposed concept for closely spaced parallel runways called the Simplified Aircraft-based Paired Approach (SAPA). This procedure depends upon a new alerting algorithm called the Adjacent Landing Alerting System (ALAS). This study used both low fidelity and high fidelity simulations to validate the SAPA procedure and test the performance of the new alerting algorithm. The low fidelity simulation enabled a determination of minimum approach distance for the worst case over millions of scenarios. The high fidelity simulation enabled an accurate determination of timings and minimum approach distance in the presence of realistic trajectories, communication latencies, and total system error for 108 test cases. The SAPA procedure and the ALAS alerting algorithm were applied to the 750-ft parallel spacing (e.g., SFO 28L/28R) approach problem. With the SAPA procedure as defined in this paper, this study concludes that a 750-ft application does not appear to be feasible, but preliminary results for 1000-ft parallel runways look promising.
The BLAZE language: A parallel language for scientific programming
NASA Technical Reports Server (NTRS)
Mehrotra, P.; Vanrosendale, J.
1985-01-01
A Pascal-like scientific programming language, Blaze, is described. Blaze contains array arithmetic, forall loops, and APL-style accumulation operators, which allow natural expression of fine grained parallelism. It also employs an applicative or functional procedure invocation mechanism, which makes it easy for compilers to extract coarse grained parallelism using machine specific program restructuring. Thus Blaze should allow one to achieve highly parallel execution on multiprocessor architectures, while still providing the user with onceptually sequential control flow. A central goal in the design of Blaze is portability across a broad range of parallel architectures. The multiple levels of parallelism present in Blaze code, in principle, allow a compiler to extract the types of parallelism appropriate for the given architecture while neglecting the remainder. The features of Blaze are described and shows how this language would be used in typical scientific programming.
Research in Parallel Algorithms and Software for Computational Aerosciences
NASA Technical Reports Server (NTRS)
Domel, Neal D.
1996-01-01
Phase I is complete for the development of a Computational Fluid Dynamics parallel code with automatic grid generation and adaptation for the Euler analysis of flow over complex geometries. SPLITFLOW, an unstructured Cartesian grid code developed at Lockheed Martin Tactical Aircraft Systems, has been modified for a distributed memory/massively parallel computing environment. The parallel code is operational on an SGI network, Cray J90 and C90 vector machines, SGI Power Challenge, and Cray T3D and IBM SP2 massively parallel machines. Parallel Virtual Machine (PVM) is the message passing protocol for portability to various architectures. A domain decomposition technique was developed which enforces dynamic load balancing to improve solution speed and memory requirements. A host/node algorithm distributes the tasks. The solver parallelizes very well, and scales with the number of processors. Partially parallelized and non-parallelized tasks consume most of the wall clock time in a very fine grain environment. Timing comparisons on a Cray C90 demonstrate that Parallel SPLITFLOW runs 2.4 times faster on 8 processors than its non-parallel counterpart autotasked over 8 processors.
Research in Parallel Algorithms and Software for Computational Aerosciences
NASA Technical Reports Server (NTRS)
Domel, Neal D.
1996-01-01
Phase 1 is complete for the development of a computational fluid dynamics CFD) parallel code with automatic grid generation and adaptation for the Euler analysis of flow over complex geometries. SPLITFLOW, an unstructured Cartesian grid code developed at Lockheed Martin Tactical Aircraft Systems, has been modified for a distributed memory/massively parallel computing environment. The parallel code is operational on an SGI network, Cray J90 and C90 vector machines, SGI Power Challenge, and Cray T3D and IBM SP2 massively parallel machines. Parallel Virtual Machine (PVM) is the message passing protocol for portability to various architectures. A domain decomposition technique was developed which enforces dynamic load balancing to improve solution speed and memory requirements. A host/node algorithm distributes the tasks. The solver parallelizes very well, and scales with the number of processors. Partially parallelized and non-parallelized tasks consume most of the wall clock time in a very fine grain environment. Timing comparisons on a Cray C90 demonstrate that Parallel SPLITFLOW runs 2.4 times faster on 8 processors than its non-parallel counterpart autotasked over 8 processors.
Examining Parallelism of Sets of Psychometric Measures Using Latent Variable Modeling
ERIC Educational Resources Information Center
Raykov, Tenko; Patelis, Thanos; Marcoulides, George A.
2011-01-01
A latent variable modeling approach that can be used to examine whether several psychometric tests are parallel is discussed. The method consists of sequentially testing the properties of parallel measures via a corresponding relaxation of parameter constraints in a saturated model or an appropriately constructed latent variable model. The…
Evaluation of Parallel Analysis Methods for Determining the Number of Factors
ERIC Educational Resources Information Center
Crawford, Aaron V.; Green, Samuel B.; Levy, Roy; Lo, Wen-Juo; Scott, Lietta; Svetina, Dubravka; Thompson, Marilyn S.
2010-01-01
Population and sample simulation approaches were used to compare the performance of parallel analysis using principal component analysis (PA-PCA) and parallel analysis using principal axis factoring (PA-PAF) to identify the number of underlying factors. Additionally, the accuracies of the mean eigenvalue and the 95th percentile eigenvalue criteria…
SPEEDES - A multiple-synchronization environment for parallel discrete-event simulation
NASA Technical Reports Server (NTRS)
Steinman, Jeff S.
1992-01-01
Synchronous Parallel Environment for Emulation and Discrete-Event Simulation (SPEEDES) is a unified parallel simulation environment. It supports multiple-synchronization protocols without requiring users to recompile their code. When a SPEEDES simulation runs on one node, all the extra parallel overhead is removed automatically at run time. When the same executable runs in parallel, the user preselects the synchronization algorithm from a list of options. SPEEDES currently runs on UNIX networks and on the California Institute of Technology/Jet Propulsion Laboratory Mark III Hypercube. SPEEDES also supports interactive simulations. Featured in the SPEEDES environment is a new parallel synchronization approach called Breathing Time Buckets. This algorithm uses some of the conservative techniques found in Time Bucket synchronization, along with the optimism that characterizes the Time Warp approach. A mathematical model derived from first principles predicts the performance of Breathing Time Buckets. Along with the Breathing Time Buckets algorithm, this paper discusses the rules for processing events in SPEEDES, describes the implementation of various other synchronization protocols supported by SPEEDES, describes some new ones for the future, discusses interactive simulations, and then gives some performance results.
Liquid rocket booster integration study. Volume 1: Executive summary
NASA Technical Reports Server (NTRS)
1988-01-01
The impacts of introducing liquid rocket booster engines (LRB) into the Space Transportation System (STS)/Kennedy Space Center (KSC) launch environment are identified and evaluated. Proposed ground systems configurations are presented along with a launch site requirements summary. Prelaunch processing scenarios are described and the required facility modifications and new facility requirements are analyzed. Flight vehicle design recommendations to enhance launch processing are discussed. Processing approaches to integrate LRB with existing STS launch operations are evaluated. The key features and significance of launch site transition to a new STS configuration in parallel with ongoing launch activities are enumerated. This volume is the executive summary of the five volume series.
Liquid rocket booster integration study. Volume 5, part 1: Appendices
NASA Technical Reports Server (NTRS)
1988-01-01
The impacts of introducing liquid rocket booster engines (LRB) into the Space Transportation System (STS)/Kennedy Space Center (KSC) launch environment are identified and evaluated. Proposed ground systems configurations are presented along with a launch site requirements summary. Prelaunch processing scenarios are described and the required facility modifications and new facility requirements are analyzed. Flight vehicle design recommendations to enhance launch processing are discussed. Processing approaches to integrate LRB with existing STS launch operations are evaluated. The key features and significance of launch site transition to a new STS configuration in parallel with ongoing launch activities are enumerated. This volume is the appendices of the five volume series.
Liquid Rocket Booster Integration Study. Volume 2: Study synopsis
NASA Technical Reports Server (NTRS)
1988-01-01
The impacts of introducing liquid rocket booster engines (LRB) into the Space Transportation System (STS)/Kennedy Space Center (KSC) launch environment are identified and evaluated. Proposed ground systems configurations are presented along with a launch site requirements summary. Prelaunch processing scenarios are described and the required facility modifications and new facility requirements are analyzed. Flight vehicle design recommendations to enhance launch processing are discussed. Processing approaches to integrate LRB with existing STS launch operations are evaluated. The key features and significance of launch site transition to a new STS configuration in parallel with ongoing launch activities are enumerated. This volume is the study summary of the five volume series.
Modeling of composite beams and plates for static and dynamic analysis
NASA Technical Reports Server (NTRS)
Hodges, Dewey H.; Atilgan, Ali R.; Lee, Bok Woo
1990-01-01
A rigorous theory and corresponding computational algorithms was developed for a variety of problems regarding the analysis of composite beams and plates. The modeling approach is intended to be applicable to both static and dynamic analysis of generally anisotropic, nonhomogeneous beams and plates. Development of a theory for analysis of the local deformation of plates was the major focus. Some work was performed on global deformation of beams. Because of the strong parallel between beams and plates, the two were treated together as thin bodies, especially in cases where it will clarify the meaning of certain terminology and the motivation behind certain mathematical operations.
A Sludge Drum in the APNea System
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hensley, D.
1998-11-17
The assay of sludge drums pushes the APNea System to a definite extreme. Even though it seems clear that neutron based assay should be the method of choice for sludge drums, the difficulties posed by this matrix push any NDA technique to its limits. Special emphasis is given here to the differential die-away technique, which appears to approach the desired sensitivity. A parallel analysis of ethafoam drums will be presented, since the ethafoam matrix fits well within the operating range of the AIWea System, and, having been part of the early PDP trials, has been assayed by many in themore » NDA community.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fang, Chin; Corttrell, R. A.
This Technical Note provides an overview of high-performance parallel Big Data transfers with and without encryption for data in-transit over multiple network channels. It shows that with the parallel approach, it is feasible to carry out high-performance parallel "encrypted" Big Data transfers without serious impact to throughput. But other impacts, e.g. the energy-consumption part should be investigated. It also explains our rationales of using a statistics-based approach for gaining understanding from test results and for improving the system. The presentation is of high-level nature. Nevertheless, at the end we will pose some questions and identify potentially fruitful directions for futuremore » work.« less
Distributed and parallel approach for handle and perform huge datasets
NASA Astrophysics Data System (ADS)
Konopko, Joanna
2015-12-01
Big Data refers to the dynamic, large and disparate volumes of data comes from many different sources (tools, machines, sensors, mobile devices) uncorrelated with each others. It requires new, innovative and scalable technology to collect, host and analytically process the vast amount of data. Proper architecture of the system that perform huge data sets is needed. In this paper, the comparison of distributed and parallel system architecture is presented on the example of MapReduce (MR) Hadoop platform and parallel database platform (DBMS). This paper also analyzes the problem of performing and handling valuable information from petabytes of data. The both paradigms: MapReduce and parallel DBMS are described and compared. The hybrid architecture approach is also proposed and could be used to solve the analyzed problem of storing and processing Big Data.