multiprocessors: Topics by Science.gov

Sample records for multiprocessors

Embedded Multiprocessor Technology for VHSIC Insertion

NASA Technical Reports Server (NTRS)

Hayes, Paul J.

1990-01-01

Viewgraphs on embedded multiprocessor technology for VHSIC insertion are presented. The objective was to develop multiprocessor system technology providing user-selectable fault tolerance, increased throughput, and ease of application representation for concurrent operation. The approach was to develop graph management mapping theory for proper performance, model multiprocessor performance, and demonstrate performance in selected hardware systems.
Validation of multiprocessor systems

NASA Technical Reports Server (NTRS)

Siewiorek, D. P.; Segall, Z.; Kong, T.

1982-01-01

Experiments that can be used to validate fault free performance of multiprocessor systems in aerospace systems integrating flight controls and avionics are discussed. Engineering prototypes for two fault tolerant multiprocessors are tested.
Programmable partitioning for high-performance coherence domains in a multiprocessor system

DOEpatents

Blumrich, Matthias A [Ridgefield, CT; Salapura, Valentina [Chappaqua, NY

2011-01-25

A multiprocessor computing system and a method of logically partitioning a multiprocessor computing system are disclosed. The multiprocessor computing system comprises a multitude of processing units, and a multitude of snoop units. Each of the processing units includes a local cache, and the snoop units are provided for supporting cache coherency in the multiprocessor system. Each of the snoop units is connected to a respective one of the processing units and to all of the other snoop units. The multiprocessor computing system further includes a partitioning system for using the snoop units to partition the multitude of processing units into a plurality of independent, memory-consistent, adjustable-size processing groups. Preferably, when the processor units are partitioned into these processing groups, the partitioning system also configures the snoop units to maintain cache coherency within each of said groups.
Queueing analysis of a canonical model of real-time multiprocessors

NASA Technical Reports Server (NTRS)

Krishna, C. M.; Shin, K. G.

1983-01-01

A logical classification of multiprocessor structures from the point of view of control applications is presented. A computation of the response time distribution for a canonical model of a real time multiprocessor is presented. The multiprocessor is approximated by a blocking model. Two separate models are derived: one created from the system's point of view, and the other from the point of view of an incoming task.
A simple modern correctness condition for a space-based high-performance multiprocessor

NASA Technical Reports Server (NTRS)

Probst, David K.; Li, Hon F.

1992-01-01

A number of U.S. national programs, including space-based detection of ballistic missile launches, envisage putting significant computing power into space. Given sufficient progress in low-power VLSI, multichip-module packaging and liquid-cooling technologies, we will see design of high-performance multiprocessors for individual satellites. In very high speed implementations, performance depends critically on tolerating large latencies in interprocessor communication; without latency tolerance, performance is limited by the vastly differing time scales in processor and data-memory modules, including interconnect times. The modern approach to tolerating remote-communication cost in scalable, shared-memory multiprocessors is to use a multithreaded architecture, and alter the semantics of shared memory slightly, at the price of forcing the programmer either to reason about program correctness in a relaxed consistency model or to agree to program in a constrained style. The literature on multiprocessor correctness conditions has become increasingly complex, and sometimes confusing, which may hinder its practical application. We propose a simple modern correctness condition for a high-performance, shared-memory multiprocessor; the correctness condition is based on a simple interface between the multiprocessor architecture and a high-performance, shared-memory multiprocessor; the correctness condition is based on a simple interface between the multiprocessor architecture and the parallel programming system.
Iterative algorithms for tridiagonal matrices on a WSI-multiprocessor

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gajski, D.D.; Sameh, A.H.; Wisniewski, J.A.

1982-01-01

With the rapid advances in semiconductor technology, the construction of Wafer Scale Integration (WSI)-multiprocessors consisting of a large number of processors is now feasible. We illustrate the implementation of some basic linear algebra algorithms on such multiprocessors.
Shared performance monitor in a multiprocessor system

DOEpatents

Chiu, George; Gara, Alan G; Salapura, Valentina

2014-12-02

A performance monitoring unit (PMU) and method for monitoring performance of events occurring in a multiprocessor system. The multiprocessor system comprises a plurality of processor devices units, each processor device for generating signals representing occurrences of events in the processor device, and, a single shared counter resource for performance monitoring. The performance monitor unit is shared by all processor cores in the multiprocessor system. The PMU is further programmed to monitor event signals issued from non-processor devices.
A fault-tolerant multiprocessor architecture for aircraft, volume 1. [autopilot configuration

NASA Technical Reports Server (NTRS)

Smith, T. B.; Hopkins, A. L.; Taylor, W.; Ausrotas, R. A.; Lala, J. H.; Hanley, L. D.; Martin, J. H.

1978-01-01

A fault-tolerant multiprocessor architecture is reported. This architecture, together with a comprehensive information system architecture, has important potential for future aircraft applications. A preliminary definition and assessment of a suitable multiprocessor architecture for such applications is developed.
Two Fundamental Issues in Multiprocessing.

DTIC Science & Technology

1987-10-01

Structural Model of a Multiprocessor 6 Figure 5: Operational Model of a Multiprocessor 7 Figure 6: The von Neumann Processor (from Gajski and Peir [201) 10...Computer Society, June, 1983. 20. Gajski , D. D. & J-K. Peir. "Essential Issues in Multiprocessor Systems". Computer 18, 6 (June 1985), 9-27. 21. Gurd
Insertion of coherence requests for debugging a multiprocessor

DOEpatents

Blumrich, Matthias A.; Salapura, Valentina

2010-02-23

A method and system are disclosed to insert coherence events in a multiprocessor computer system, and to present those coherence events to the processors of the multiprocessor computer system for analysis and debugging purposes. The coherence events are inserted in the computer system by adding one or more special insert registers. By writing into the insert registers, coherence events are inserted in the multiprocessor system as if they were generated by the normal coherence protocol. Once these coherence events are processed, the processing of coherence events can continue in the normal operation mode.
Method and apparatus for single-stepping coherence events in a multiprocessor system under software control

DOEpatents

Blumrich, Matthias A.; Salapura, Valentina

2010-11-02

An apparatus and method are disclosed for single-stepping coherence events in a multiprocessor system under software control in order to monitor the behavior of a memory coherence mechanism. Single-stepping coherence events in a multiprocessor system is made possible by adding one or more step registers. By accessing these step registers, one or more coherence requests are processed by the multiprocessor system. The step registers determine if the snoop unit will operate by proceeding in a normal execution mode, or operate in a single-step mode.
Shared versus distributed memory multiprocessors

NASA Technical Reports Server (NTRS)

Jordan, Harry F.

1991-01-01

The question of whether multiprocessors should have shared or distributed memory has attracted a great deal of attention. Some researchers argue strongly for building distributed memory machines, while others argue just as strongly for programming shared memory multiprocessors. A great deal of research is underway on both types of parallel systems. Special emphasis is placed on systems with a very large number of processors for computation intensive tasks and considers research and implementation trends. It appears that the two types of systems will likely converge to a common form for large scale multiprocessors.
Multiprocessor computer overset grid method and apparatus

DOEpatents

Barnette, Daniel W.; Ober, Curtis C.

2003-01-01

A multiprocessor computer overset grid method and apparatus comprises associating points in each overset grid with processors and using mapped interpolation transformations to communicate intermediate values between processors assigned base and target points of the interpolation transformations. The method allows a multiprocessor computer to operate with effective load balance on overset grid applications.
A large-grain mapping approach for multiprocessor systems through data flow model. Ph.D. Thesis

NASA Technical Reports Server (NTRS)

Kim, Hwa-Soo

1991-01-01

A large-grain level mapping method is presented of numerical oriented applications onto multiprocessor systems. The method is based on the large-grain data flow representation of the input application and it assumes a general interconnection topology of the multiprocessor system. The large-grain data flow model was used because such representation best exhibits inherited parallelism in many important applications, e.g., CFD models based on partial differential equations can be presented in large-grain data flow format, very effectively. A generalized interconnection topology of the multiprocessor architecture is considered, including such architectural issues as interprocessor communication cost, with the aim to identify the 'best matching' between the application and the multiprocessor structure. The objective is to minimize the total execution time of the input algorithm running on the target system. The mapping strategy consists of the following: (1) large-grain data flow graph generation from the input application using compilation techniques; (2) data flow graph partitioning into basic computation blocks; and (3) physical mapping onto the target multiprocessor using a priority allocation scheme for the computation blocks.
High Performance, Dependable Multiprocessor

NASA Technical Reports Server (NTRS)

Ramos, Jeremy; Samson, John R.; Troxel, Ian; Subramaniyan, Rajagopal; Jacobs, Adam; Greco, James; Cieslewski, Grzegorz; Curreri, John; Fischer, Michael; Grobelny, Eric;

2006-01-01

With the ever increasing demand for higher bandwidth and processing capacity of today's space exploration, space science, and defense missions, the ability to efficiently apply commercial-off-the-shelf (COTS) processors for on-board computing is now a critical need. In response to this need, NASA's New Millennium Program office has commissioned the development of Dependable Multiprocessor (DM) technology for use in payload and robotic missions. The Dependable Multiprocessor technology is a COTS-based, power efficient, high performance, highly dependable, fault tolerant cluster computer. To date, Honeywell has successfully demonstrated a TRL4 prototype of the Dependable Multiprocessor [I], and is now working on the development of a TRLS prototype. For the present effort Honeywell has teamed up with the University of Florida's High-performance Computing and Simulation (HCS) Lab, and together the team has demonstrated major elements of the Dependable Multiprocessor TRLS system.

VME rollback hardware for time warp multiprocessor systems

NASA Technical Reports Server (NTRS)

Robb, Michael J.; Buzzell, Calvin A.

1992-01-01

The purpose of the research effort is to develop and demonstrate innovative hardware to implement specific rollback and timing functions required for efficient queue management and precision timekeeping in multiprocessor discrete event simulations. The previously completed phase 1 effort demonstrated the technical feasibility of building hardware modules which eliminate the state saving overhead of the Time Warp paradigm used in distributed simulations on multiprocessor systems. The current phase 2 effort will build multiple pre-production rollback hardware modules integrated with a network of Sun workstations, and the integrated system will be tested by executing a Time Warp simulation. The rollback hardware will be designed to interface with the greatest number of multiprocessor systems possible. The authors believe that the rollback hardware will provide for significant speedup of large scale discrete event simulation problems and allow multiprocessors using Time Warp to dramatically increase performance.
Design and implementation of a modulator-based free-space optical backplane for multiprocessor applications.

PubMed

Kirk, Andrew G; Plant, David V; Szymanski, Ted H; Vranesic, Zvonko G; Tooley, Frank A P; Rolston, David R; Ayliffe, Michael H; Lacroix, Frederic K; Robertson, Brian; Bernier, Eric; Brosseau, Daniel F

2003-05-10

Design and implementation of a free-space optical backplane for multiprocessor applications is presented. The system is designed to interconnect four multiprocessor nodes that communicate by using multiplexed 32-bit packets. Each multiprocessor node is electrically connected to an optoelectronic VLSI chip which implements the hyperplane interconnection architecture. The chips each contain 256 optical transmitters (implemented as dual-rail multiple quantum-well modulators) and 256 optical receivers. A rigid free-space microoptical interconnection system that interconnects the transceiver chips in a 512-channel unidirectional ring is implemented. Full design, implementation, and operational details are provided.
Design and implementation of a modulator-based free-space optical backplane for multiprocessor applications

NASA Astrophysics Data System (ADS)

Kirk, Andrew G.; Plant, David V.; Szymanski, Ted H.; Vranesic, Zvonko G.; Tooley, Frank A. P.; Rolston, David R.; Ayliffe, Michael H.; Lacroix, Frederic K.; Robertson, Brian; Bernier, Eric; Brosseau, Daniel F.

2003-05-01

Design and implementation of a free-space optical backplane for multiprocessor applications is presented. The system is designed to interconnect four multiprocessor nodes that communicate by using multiplexed 32-bit packets. Each multiprocessor node is electrically connected to an optoelectronic VLSI chip which implements the hyperplane interconnection architecture. The chips each contain 256 optical transmitters (implemented as dual-rail multiple quantum-well modulators) and 256 optical receivers. A rigid free-space microoptical interconnection system that interconnects the transceiver chips in a 512-channel unidirectional ring is implemented. Full design, implementation, and operational details are provided.
Operating system for a real-time multiprocessor propulsion system simulator. User's manual

NASA Technical Reports Server (NTRS)

Cole, G. L.

1985-01-01

The NASA Lewis Research Center is developing and evaluating experimental hardware and software systems to help meet future needs for real-time, high-fidelity simulations of air-breathing propulsion systems. Specifically, the real-time multiprocessor simulator project focuses on the use of multiple microprocessors to achieve the required computing speed and accuracy at relatively low cost. Operating systems for such hardware configurations are generally not available. A real time multiprocessor operating system (RTMPOS) that supports a variety of multiprocessor configurations was developed at Lewis. With some modification, RTMPOS can also support various microprocessors. RTMPOS, by means of menus and prompts, provides the user with a versatile, user-friendly environment for interactively loading, running, and obtaining results from a multiprocessor-based simulator. The menu functions are described and an example simulation session is included to demonstrate the steps required to go from the simulation loading phase to the execution phase.
Resource Management for Distributed Parallel Systems

NASA Technical Reports Server (NTRS)

Neuman, B. Clifford; Rao, Santosh

1993-01-01

Multiprocessor systems should exist in the the larger context of distributed systems, allowing multiprocessor resources to be shared by those that need them. Unfortunately, typical multiprocessor resource management techniques do not scale to large networks. The Prospero Resource Manager (PRM) is a scalable resource allocation system that supports the allocation of processing resources in large networks and multiprocessor systems. To manage resources in such distributed parallel systems, PRM employs three types of managers: system managers, job managers, and node managers. There exist multiple independent instances of each type of manager, reducing bottlenecks. The complexity of each manager is further reduced because each is designed to utilize information at an appropriate level of abstraction.

File-System Workload on a Scientific Multiprocessor

NASA Technical Reports Server (NTRS)

Kotz, David; Nieuwejaar, Nils

1995-01-01

Many scientific applications have intense computational and I/O requirements. Although multiprocessors have permitted astounding increases in computational performance, the formidable I/O needs of these applications cannot be met by current multiprocessors a their I/O subsystems. To prevent I/O subsystems from forever bottlenecking multiprocessors and limiting the range of feasible applications, new I/O subsystems must be designed. The successful design of computer systems (both hardware and software) depends on a thorough understanding of their intended use. A system designer optimizes the policies and mechanisms for the cases expected to most common in the user's workload. In the case of multiprocessor file systems, however, designers have been forced to build file systems based only on speculation about how they would be used, extrapolating from file-system characterizations of general-purpose workloads on uniprocessor and distributed systems or scientific workloads on vector supercomputers (see sidebar on related work). To help these system designers, in June 1993 we began the Charisma Project, so named because the project sought to characterize 1/0 in scientific multiprocessor applications from a variety of production parallel computing platforms and sites. The Charisma project is unique in recording individual read and write requests-in live, multiprogramming, parallel workloads (rather than from selected or nonparallel applications). In this article, we present the first results from the project: a characterization of the file-system workload an iPSC/860 multiprocessor running production, parallel scientific applications at NASA's Ames Research Center.
Shared performance monitor in a multiprocessor system

DOEpatents

Chiu, George; Gara, Alan G.; Salapura, Valentina

2012-07-24

A performance monitoring unit (PMU) and method for monitoring performance of events occurring in a multiprocessor system. The multiprocessor system comprises a plurality of processor devices units, each processor device for generating signals representing occurrences of events in the processor device, and, a single shared counter resource for performance monitoring. The performance monitor unit is shared by all processor cores in the multiprocessor system. The PMU comprises: a plurality of performance counters each for counting signals representing occurrences of events from one or more the plurality of processor units in the multiprocessor system; and, a plurality of input devices for receiving the event signals from one or more processor devices of the plurality of processor units, the plurality of input devices programmable to select event signals for receipt by one or more of the plurality of performance counters for counting, wherein the PMU is shared between multiple processing units, or within a group of processors in the multiprocessing system. The PMU is further programmed to monitor event signals issued from non-processor devices.
A cache-aided multiprocessor rollback recovery scheme

NASA Technical Reports Server (NTRS)

Wu, Kun-Lung; Fuchs, W. Kent

1989-01-01

This paper demonstrates how previous uniprocessor cache-aided recovery schemes can be applied to multiprocessor architectures, for recovering from transient processor failures, utilizing private caches and a global shared memory. As with cache-aided uniprocessor recovery, the multiprocessor cache-aided recovery scheme of this paper can be easily integrated into standard bus-based snoopy cache coherence protocols. A consistent shared memory state is maintained without the necessity of global check-pointing.
Parallel and fault-tolerant algorithms for hypercube multiprocessors

DOE Office of Scientific and Technical Information (OSTI.GOV)

Aykanat, C.

1988-01-01

Several techniques for increasing the performance of parallel algorithms on distributed-memory message-passing multi-processor systems are investigated. These techniques are effectively implemented for the parallelization of the Scaled Conjugate Gradient (SCG) algorithm on a hypercube connected message-passing multi-processor. Significant performance improvement is achieved by using these techniques. The SCG algorithm is used for the solution phase of an FE modeling system. Almost linear speed-up is achieved, and it is shown that hypercube topology is scalable for an FE class of problem. The SCG algorithm is also shown to be suitable for vectorization, and near supercomputer performance is achieved on a vectormore » hypercube multiprocessor by exploiting both parallelization and vectorization. Fault-tolerance issues for the parallel SCG algorithm and for the hypercube topology are also addressed.« less
Realtime multiprocessor for mobile ad hoc networks

NASA Astrophysics Data System (ADS)

Jungeblut, T.; Grünewald, M.; Porrmann, M.; Rückert, U.

2008-05-01

This paper introduces a real-time Multiprocessor System-On-Chip (MPSoC) for low power wireless applications. The multiprocessor is based on eight 32bit RISC processors that are connected via an Network-On-Chip (NoC). The NoC follows a novel approach with guaranteed bandwidth to the application that meets hard realtime requirements. At a clock frequency of 100 MHz the total power consumption of the MPSoC that has been fabricated in 180 nm UMC standard cell technology is 772 mW.
Development and evaluation of a fault-tolerant multiprocessor (FTMP) computer. Volume 1: FTMP principles of operation

NASA Technical Reports Server (NTRS)

Smith, T. B., Jr.; Lala, J. H.

1983-01-01

The basic organization of the fault tolerant multiprocessor, (FTMP) is that of a general purpose homogeneous multiprocessor. Three processors operate on a shared system (memory and I/O) bus. Replication and tight synchronization of all elements and hardware voting is employed to detect and correct any single fault. Reconfiguration is then employed to repair a fault. Multiple faults may be tolerated as a sequence of single faults with repair between fault occurrences.
Compiler-directed cache management in multiprocessors

NASA Technical Reports Server (NTRS)

Cheong, Hoichi; Veidenbaum, Alexander V.

1990-01-01

The necessity of finding alternatives to hardware-based cache coherence strategies for large-scale multiprocessor systems is discussed. Three different software-based strategies sharing the same goals and general approach are presented. They consist of a simple invalidation approach, a fast selective invalidation scheme, and a version control scheme. The strategies are suitable for shared-memory multiprocessor systems with interconnection networks and a large number of processors. Results of trace-driven simulations conducted on numerical benchmark routines to compare the performance of the three schemes are presented.
ATAMM enhancement and multiprocessor performance evaluation

NASA Technical Reports Server (NTRS)

Stoughton, John W.; Mielke, Roland R.; Som, Sukhamoy; Obando, Rodrigo; Malekpour, Mahyar R.; Jones, Robert L., III; Mandala, Brij Mohan V.

1991-01-01

ATAMM (Algorithm To Architecture Mapping Model) enhancement and multiprocessor performance evaluation is discussed. The following topics are included: the ATAMM model; ATAMM enhancement; ADM (Advanced Development Model) implementation of ATAMM; and ATAMM support tools.
Multiprocessor architectural study

NASA Technical Reports Server (NTRS)

Kosmala, A. L.; Stanten, S. F.; Vandever, W. H.

1972-01-01

An architectural design study was made of a multiprocessor computing system intended to meet functional and performance specifications appropriate to a manned space station application. Intermetrics, previous experience, and accumulated knowledge of the multiprocessor field is used to generate a baseline philosophy for the design of a future SUMC* multiprocessor. Interrupts are defined and the crucial questions of interrupt structure, such as processor selection and response time, are discussed. Memory hierarchy and performance is discussed extensively with particular attention to the design approach which utilizes a cache memory associated with each processor. The ability of an individual processor to approach its theoretical maximum performance is then analyzed in terms of a hit ratio. Memory management is envisioned as a virtual memory system implemented either through segmentation or paging. Addressing is discussed in terms of various register design adopted by current computers and those of advanced design.
Multiprocessor smalltalk: Implementation, performance, and analysis

DOE Office of Scientific and Technical Information (OSTI.GOV)

Pallas, J.I.

1990-01-01

Multiprocessor Smalltalk demonstrates the value of object-oriented programming on a multiprocessor. Its implementation and analysis shed light on three areas: concurrent programming in an object oriented language without special extensions, implementation techniques for adapting to multiprocessors, and performance factors in the resulting system. Adding parallelism to Smalltalk code is easy, because programs already use control abstractions like iterators. Smalltalk's basic control and concurrency primitives (lambda expressions, processes and semaphores) can be used to build parallel control abstractions, including parallel iterators, parallel objects, atomic objects, and futures. Language extensions for concurrency are not required. This implementation demonstrates that it is possiblemore » to build an efficient parallel object-oriented programming system and illustrates techniques for doing so. Three modification tools-serialization, replication, and reorganization-adapted the Berkeley Smalltalk interpreter to the Firefly multiprocessor. Multiprocessor Smalltalk's performance shows that the combination of multiprocessing and object-oriented programming can be effective: speedups (relative to the original serial version) exceed 2.0 for five processors on all the benchmarks; the median efficiency is 48%. Analysis shows both where performance is lost and how to improve and generalize the experimental results. Changes in the interpreter to support concurrency add at most 12% overhead; better access to per-process variables could eliminate much of that. Changes in the user code to express concurrency add as much as 70% overhead; this overhead could be reduced to 54% if blocks (lambda expressions) were reentrant. Performance is also lost when the program cannot keep all five processors busy.« less
A fault-tolerant information processing concept for space vehicles.

NASA Technical Reports Server (NTRS)

Hopkins, A. L., Jr.

1971-01-01

A distributed fault-tolerant information processing system is proposed, comprising a central multiprocessor, dedicated local processors, and multiplexed input-output buses connecting them together. The processors in the multiprocessor are duplicated for error detection, which is felt to be less expensive than using coded redundancy of comparable effectiveness. Error recovery is made possible by a triplicated scratchpad memory in each processor. The main multiprocessor memory uses replicated memory for error detection and correction. Local processors use any of three conventional redundancy techniques: voting, duplex pairs with backup, and duplex pairs in independent subsystems.
Cache-based error recovery for shared memory multiprocessor systems

NASA Technical Reports Server (NTRS)

Wu, Kun-Lung; Fuchs, W. Kent; Patel, Janak H.

1989-01-01

A multiprocessor cache-based checkpointing and recovery scheme for of recovering from transient processor errors in a shared-memory multiprocessor with private caches is presented. New implementation techniques that use checkpoint identifiers and recovery stacks to reduce performance degradation in processor utilization during normal execution are examined. This cache-based checkpointing technique prevents rollback propagation, provides for rapid recovery, and can be integrated into standard cache coherence protocols. An analytical model is used to estimate the relative performance of the scheme during normal execution. Extensions that take error latency into account are presented.
The Galley Parallel File System

NASA Technical Reports Server (NTRS)

Nieuwejaar, Nils; Kotz, David

1996-01-01

Most current multiprocessor file systems are designed to use multiple disks in parallel, using the high aggregate bandwidth to meet the growing I/0 requirements of parallel scientific applications. Many multiprocessor file systems provide applications with a conventional Unix-like interface, allowing the application to access multiple disks transparently. This interface conceals the parallelism within the file system, increasing the ease of programmability, but making it difficult or impossible for sophisticated programmers and libraries to use knowledge about their I/O needs to exploit that parallelism. In addition to providing an insufficient interface, most current multiprocessor file systems are optimized for a different workload than they are being asked to support. We introduce Galley, a new parallel file system that is intended to efficiently support realistic scientific multiprocessor workloads. We discuss Galley's file structure and application interface, as well as the performance advantages offered by that interface.
Preliminary basic performance analysis of the Cedar multiprocessor memory system

NASA Technical Reports Server (NTRS)

Gallivan, K.; Jalby, W.; Turner, S.; Veidenbaum, A.; Wijshoff, H.

1991-01-01

Some preliminary basic results on the performance of the Cedar multiprocessor memory system are presented. Empirical results are presented and used to calibrate a memory system simulator which is then used to discuss the scalability of the system.
Real-Time Multiprocessor Programming Language (RTMPL) user's manual

NASA Technical Reports Server (NTRS)

Arpasi, D. J.

1985-01-01

A real-time multiprocessor programming language (RTMPL) has been developed to provide for high-order programming of real-time simulations on systems of distributed computers. RTMPL is a structured, engineering-oriented language. The RTMPL utility supports a variety of multiprocessor configurations and types by generating assembly language programs according to user-specified targeting information. Many programming functions are assumed by the utility (e.g., data transfer and scaling) to reduce the programming chore. This manual describes RTMPL from a user's viewpoint. Source generation, applications, utility operation, and utility output are detailed. An example simulation is generated to illustrate many RTMPL features.
Validation of fault-free behavior of a reliable multiprocessor system - FTMP: A case study. [Fault-Tolerant Multi-Processor avionics

NASA Technical Reports Server (NTRS)

Clune, E.; Segall, Z.; Siewiorek, D.

1984-01-01

A program of experiments has been conducted at NASA-Langley to test the fault-free performance of a Fault-Tolerant Multiprocessor (FTMP) avionics system for next-generation aircraft. Baseline measurements of an operating FTMP system were obtained with respect to the following parameters: instruction execution time, frame size, and the variation of clock ticks. The mechanisms of frame stretching were also investigated. The experimental results are summarized in a table. Areas of interest for future tests are identified, with emphasis given to the implementation of a synthetic workload generation mechanism on FTMP.
Techniques for video compression

NASA Technical Reports Server (NTRS)

Wu, Chwan-Hwa

1995-01-01

In this report, we present our study on multiprocessor implementation of a MPEG2 encoding algorithm. First, we compare two approaches to implementing video standards, VLSI technology and multiprocessor processing, in terms of design complexity, applications, and cost. Then we evaluate the functional modules of MPEG2 encoding process in terms of their computation time. Two crucial modules are identified based on this evaluation. Then we present our experimental study on the multiprocessor implementation of the two crucial modules. Data partitioning is used for job assignment. Experimental results show that high speedup ratio and good scalability can be achieved by using this kind of job assignment strategy.
Spaceborne VHSIC multiprocessor system for AI applications

NASA Technical Reports Server (NTRS)

Lum, Henry, Jr.; Shrobe, Howard E.; Aspinall, John G.

1988-01-01

A multiprocessor system, under design for space-station applications, makes use of the latest generation symbolic processor and packaging technology. The result will be a compact, space-qualified system two to three orders of magnitude more powerful than present-day symbolic processing systems.
Modelling parallel programs and multiprocessor architectures with AXE

NASA Technical Reports Server (NTRS)

Yan, Jerry C.; Fineman, Charles E.

1991-01-01

AXE, An Experimental Environment for Parallel Systems, was designed to model and simulate for parallel systems at the process level. It provides an integrated environment for specifying computation models, multiprocessor architectures, data collection, and performance visualization. AXE is being used at NASA-Ames for developing resource management strategies, parallel problem formulation, multiprocessor architectures, and operating system issues related to the High Performance Computing and Communications Program. AXE's simple, structured user-interface enables the user to model parallel programs and machines precisely and efficiently. Its quick turn-around time keeps the user interested and productive. AXE models multicomputers. The user may easily modify various architectural parameters including the number of sites, connection topologies, and overhead for operating system activities. Parallel computations in AXE are represented as collections of autonomous computing objects known as players. Their use and behavior is described. Performance data of the multiprocessor model can be observed on a color screen. These include CPU and message routing bottlenecks, and the dynamic status of the software.
A Multiprocessor SoC Architecture with Efficient Communication Infrastructure and Advanced Compiler Support for Easy Application Development

NASA Astrophysics Data System (ADS)

Urfianto, Mohammad Zalfany; Isshiki, Tsuyoshi; Khan, Arif Ullah; Li, Dongju; Kunieda, Hiroaki

This paper presentss a Multiprocessor System-on-Chips (MPSoC) architecture used as an execution platform for the new C-language based MPSoC design framework we are currently developing. The MPSoC architecture is based on an existing SoC platform with a commercial RISC core acting as the host CPU. We extend the existing SoC with a multiprocessor-array block that is used as the main engine to run parallel applications modeled in our design framework. Utilizing several optimizations provided by our compiler, an efficient inter-communication between processing elements with minimum overhead is implemented. A host-interface is designed to integrate the existing RISC core to the multiprocessor-array. The experimental results show that an efficacious integration is achieved, proving that the designed communication module can be used to efficiently incorporate off-the-shelf processors as a processing element for MPSoC architectures designed using our framework.

Optical RAM-enabled cache memory and optical routing for chip multiprocessors: technologies and architectures

NASA Astrophysics Data System (ADS)

Pleros, Nikos; Maniotis, Pavlos; Alexoudi, Theonitsa; Fitsios, Dimitris; Vagionas, Christos; Papaioannou, Sotiris; Vyrsokinos, K.; Kanellos, George T.

2014-03-01

The processor-memory performance gap, commonly referred to as "Memory Wall" problem, owes to the speed mismatch between processor and electronic RAM clock frequencies, forcing current Chip Multiprocessor (CMP) configurations to consume more than 50% of the chip real-estate for caching purposes. In this article, we present our recent work spanning from Si-based integrated optical RAM cell architectures up to complete optical cache memory architectures for Chip Multiprocessor configurations. Moreover, we discuss on e/o router subsystems with up to Tb/s routing capacity for cache interconnection purposes within CMP configurations, currently pursued within the FP7 PhoxTrot project.
A general model for memory interference in a multiprocessor system with memory hierarchy

NASA Technical Reports Server (NTRS)

Taha, Badie A.; Standley, Hilda M.

1989-01-01

The problem of memory interference in a multiprocessor system with a hierarchy of shared buses and memories is addressed. The behavior of the processors is represented by a sequence of memory requests with each followed by a determined amount of processing time. A statistical queuing network model for determining the extent of memory interference in multiprocessor systems with clusters of memory hierarchies is presented. The performance of the system is measured by the expected number of busy memory clusters. The results of the analytic model are compared with simulation results, and the correlation between them is found to be very high.
Experimental evaluation of multiprocessor cache-based error recovery

NASA Technical Reports Server (NTRS)

Janssens, Bob; Fuchs, W. K.

1991-01-01

Several variations of cache-based checkpointing for rollback error recovery in shared-memory multiprocessors have been recently developed. By modifying the cache replacement policy, these techniques use the inherent redundancy in the memory hierarchy to periodically checkpoint the computation state. Three schemes, different in the manner in which they avoid rollback propagation, are evaluated. By simulation with address traces from parallel applications running on an Encore Multimax shared-memory multiprocessor, the performance effect of integrating the recovery schemes in the cache coherence protocol are evaluated. The results indicate that the cache-based schemes can provide checkpointing capability with low performance overhead but uncontrollable high variability in the checkpoint interval.
Polymorphous Computing Architectures

DTIC Science & Technology

2007-12-12

provide a multiprocessor implementation. In this work, we introduce the Atomos transactional programming language, which is the first to include...implicit transactions, strong atomicity, and a scalable multiprocessor implementation [47]. Atomos is derived from Java, but replaces its synchronization...and conditional waiting constructs with transactional alternatives. The Atomos conditional waiting proposal is tailored to allow efficient
C-MOS array design techniques: SUMC multiprocessor system study

NASA Technical Reports Server (NTRS)

Clapp, W. A.; Helbig, W. A.; Merriam, A. S.

1972-01-01

The current capabilities of LSI techniques for speed and reliability, plus the possibilities of assembling large configurations of LSI logic and storage elements, have demanded the study of multiprocessors and multiprocessing techniques, problems, and potentialities. Evaluated are three previous systems studies for a space ultrareliable modular computer multiprocessing system, and a new multiprocessing system is proposed that is flexibly configured with up to four central processors, four 1/0 processors, and 16 main memory units, plus auxiliary memory and peripheral devices. This multiprocessor system features a multilevel interrupt, qualified S/360 compatibility for ground-based generation of programs, virtual memory management of a storage hierarchy through 1/0 processors, and multiport access to multiple and shared memory units.
Static Scheduler for Hard Real-Time Tasks on Multiprocessor Systems

DTIC Science & Technology

1992-09-01

Foundation of Computer Science, 1980 . [SIM83] Simons, B., "Multiprocessor Scheduling of Unit-Time Jobs with Arbitrary Release Times and Deadlines", SIAM...Research Office Attn: Dr. David Hislop P. O. Box 12211 Research Triangle Park, NC 27709-2211 31. Persistent Data Systems 75 W. Chapel Ridge Road Attn: Dr
Fault tree models for fault tolerant hypercube multiprocessors

NASA Technical Reports Server (NTRS)

Boyd, Mark A.; Tuazon, Jezus O.

1991-01-01

Three candidate fault tolerant hypercube architectures are modeled, their reliability analyses are compared, and the resulting implications of these methods of incorporating fault tolerance into hypercube multiprocessors are discussed. In the course of performing the reliability analyses, the use of HARP and fault trees in modeling sequence dependent system behaviors is demonstrated.
Modeling and measurement of fault-tolerant multiprocessors

NASA Technical Reports Server (NTRS)

Shin, K. G.; Woodbury, M. H.; Lee, Y. H.

1985-01-01

The workload effects on computer performance are addressed first for a highly reliable unibus multiprocessor used in real-time control. As an approach to studing these effects, a modified Stochastic Petri Net (SPN) is used to describe the synchronous operation of the multiprocessor system. From this model the vital components affecting performance can be determined. However, because of the complexity in solving the modified SPN, a simpler model, i.e., a closed priority queuing network, is constructed that represents the same critical aspects. The use of this model for a specific application requires the partitioning of the workload into job classes. It is shown that the steady state solution of the queuing model directly produces useful results. The use of this model in evaluating an existing system, the Fault Tolerant Multiprocessor (FTMP) at the NASA AIRLAB, is outlined with some experimental results. Also addressed is the technique of measuring fault latency, an important microscopic system parameter. Most related works have assumed no or a negligible fault latency and then performed approximate analyses. To eliminate this deficiency, a new methodology for indirectly measuring fault latency is presented.
Neural networks and MIMD-multiprocessors

NASA Technical Reports Server (NTRS)

Vanhala, Jukka; Kaski, Kimmo

1990-01-01

Two artificial neural network models are compared. They are the Hopfield Neural Network Model and the Sparse Distributed Memory model. Distributed algorithms for both of them are designed and implemented. The run time characteristics of the algorithms are analyzed theoretically and tested in practice. The storage capacities of the networks are compared. Implementations are done using a distributed multiprocessor system.
Multiprocessor programming environment

DOE Office of Scientific and Technical Information (OSTI.GOV)

Smith, M.B.; Fornaro, R.

Programming tools and techniques have been well developed for traditional uniprocessor computer systems. The focus of this research project is on the development of a programming environment for a high speed real time heterogeneous multiprocessor system, with special emphasis on languages and compilers. The new tools and techniques will allow a smooth transition for programmers with experience only on single processor systems.
Shared Versus Distributed Memory Multiprocessors

DTIC Science & Technology

1991-01-01

multiprocessors should hawe shared or dis.trimuted meieo-% ha~ trr ~ g ’’~ de~i c4~accio;, S Cm teaicners argue S trongly tor Outiding (li15 tri huted...Applications, MIT Press (1985). 161 D. Gajski et el., "Cedar," Proc. Compcon, pp. 306-309 (Spring 19S9). 171 S. Ahuja, N. Carriero and D. Gelernter, "Linda
The Minerva Multi-Microprocessor.

DTIC Science & Technology

A multiprocessor system is described which is an experiment in low cost, extensible, multiprocessor architectures. Global issues such as inclusion of a central bus, design of the bus arbiter, and methods of interrupt handling are considered. The system initially includes two processor types, based on microprocessors, and these are discussed. Methods for reducing processor demand for the central bus are described.
FTMP - A highly reliable Fault-Tolerant Multiprocessor for aircraft

NASA Technical Reports Server (NTRS)

Hopkins, A. L., Jr.; Smith, T. B., III; Lala, J. H.

1978-01-01

The FTMP (Fault-Tolerant Multiprocessor) is a complex multiprocessor computer that employs a form of redundancy related to systems considered by Mathur (1971), in which each major module can substitute for any other module of the same type. Despite the conceptual simplicity of the redundancy form, the implementation has many intricacies owing partly to the low target failure rate, and partly to the difficulty of eliminating single-fault vulnerability. An extensive analysis of the computer through the use of such modeling techniques as Markov processes and combinatorial mathematics shows that for random hard faults the computer can meet its requirements. It is also shown that the maintenance scheduled at intervals of 200 hr or more can be adequate most of the time.
Benchmarking GNU Radio Kernels and Multi-Processor Scheduling

DTIC Science & Technology

2013-01-14

AMD E350 APU , comparable to Atom • ARM Cortex A8 running on a Gumstix Overo on an Ettus USRP E110 The general testing procedure consists of • Build...Intel Atom, and the AMD E350 APU . 3.2 Multi-Processor Scheduling Figure 1: GFLOPs per second through an FFT array on an Intel i7. Example output from
A Real-Time Linux for Multicore Platforms

DTIC Science & Technology

2013-12-20

under ARO support) to obtain a fully-functional OS for supporting real-time workloads on multicore platforms. This system, called LITMUS -RT...to be specified as plugin components. LITMUS -RT is open-source software (available at The views, opinions and/or findings contained in this report... LITMUS -RT (LInux Testbed for MUltiprocessor Scheduling in Real-Time systems), allows different multiprocessor real-time scheduling and
Considerations for Multiprocessor Topologies

NASA Technical Reports Server (NTRS)

Byrd, Gregory T.; Delagi, Bruce A.

1987-01-01

Choosing a multiprocessor interconnection topology may depend on high-level considerations, such as the intended application domain and the expected number of processors. It certainly depends on low-level implementation details, such as packaging and communications protocols. The authors first use rough measures of cost and performance to characterize several topologies. They then examine how implementation details can affect the realizable performance of a topology.
A multiprocessor computer simulation model employing a feedback scheduler/allocator for memory space and bandwidth matching and TMR processing

NASA Technical Reports Server (NTRS)

Bradley, D. B.; Irwin, J. D.

1974-01-01

A computer simulation model for a multiprocessor computer is developed that is useful for studying the problem of matching multiprocessor's memory space, memory bandwidth and numbers and speeds of processors with aggregate job set characteristics. The model assumes an input work load of a set of recurrent jobs. The model includes a feedback scheduler/allocator which attempts to improve system performance through higher memory bandwidth utilization by matching individual job requirements for space and bandwidth with space availability and estimates of bandwidth availability at the times of memory allocation. The simulation model includes provisions for specifying precedence relations among the jobs in a job set, and provisions for specifying precedence execution of TMR (Triple Modular Redundant and SIMPLEX (non redundant) jobs.
Dataflow computing approach in high-speed digital simulation

NASA Technical Reports Server (NTRS)

Ercegovac, M. D.; Karplus, W. J.

1984-01-01

New computational tools and methodologies for the digital simulation of continuous systems were explored. Programmability, and cost effective performance in multiprocessor organizations for real time simulation was investigated. Approach is based on functional style languages and data flow computing principles, which allow for the natural representation of parallelism in algorithms and provides a suitable basis for the design of cost effective high performance distributed systems. The objectives of this research are to: (1) perform comparative evaluation of several existing data flow languages and develop an experimental data flow language suitable for real time simulation using multiprocessor systems; (2) investigate the main issues that arise in the architecture and organization of data flow multiprocessors for real time simulation; and (3) develop and apply performance evaluation models in typical applications.
Design and evaluation of a fault-tolerant multiprocessor using hardware recovery blocks

NASA Technical Reports Server (NTRS)

Lee, Y. H.; Shin, K. G.

1982-01-01

A fault-tolerant multiprocessor with a rollback recovery mechanism is discussed. The rollback mechanism is based on the hardware recovery block which is a hardware equivalent to the software recovery block. The hardware recovery block is constructed by consecutive state-save operations and several state-save units in every processor and memory module. When a fault is detected, the multiprocessor reconfigures itself to replace the faulty component and then the process originally assigned to the faulty component retreats to one of the previously saved states in order to resume fault-free execution. A mathematical model is proposed to calculate both the coverage of multi-step rollback recovery and the risk of restart. A performance evaluation in terms of task execution time is also presented.
Reproducibility in a multiprocessor system

DOEpatents

Bellofatto, Ralph A; Chen, Dong; Coteus, Paul W; Eisley, Noel A; Gara, Alan; Gooding, Thomas M; Haring, Rudolf A; Heidelberger, Philip; Kopcsay, Gerard V; Liebsch, Thomas A; Ohmacht, Martin; Reed, Don D; Senger, Robert M; Steinmacher-Burow, Burkhard; Sugawara, Yutaka

2013-11-26

Fixing a problem is usually greatly aided if the problem is reproducible. To ensure reproducibility of a multiprocessor system, the following aspects are proposed; a deterministic system start state, a single system clock, phase alignment of clocks in the system, system-wide synchronization events, reproducible execution of system components, deterministic chip interfaces, zero-impact communication with the system, precise stop of the system and a scan of the system state.

Bibliography On Multiprocessors And Distributed Processing

NASA Technical Reports Server (NTRS)

Miya, Eugene N.

1988-01-01

Multiprocessor and Distributed Processing Bibliography package consists of large machine-readable bibliographic data base, which in addition to usual keyword searches, used for producing citations, indexes, and cross-references. Data base contains UNIX(R) "refer" -formatted ASCII data and implemented on any computer running under UNIX(R) operating system. Easily convertible to other operating systems. Requires approximately one megabyte of secondary storage. Bibliography compiled in 1985.
Analysis of Photonic Networks for a Chip Multiprocessor Using Scientific Applications

DTIC Science & Technology

2009-05-01

Analysis of Photonic Networks for a Chip Multiprocessor Using Scientific Applications Gilbert Hendry†, Shoaib Kamil‡?, Aleksandr Biberman†, Johnnie...electronic networks -on-chip warrants investigating real application traces on functionally compa- rable photonic and electronic network designs. We... network can achieve 75× improvement in energy ef- ficiency for synthetic benchmarks and up to 37× improve- ment for real scientific applications
Method for wiring allocation and switch configuration in a multiprocessor environment

DOEpatents

Aridor, Yariv [Zichron Ya'akov, IL; Domany, Tamar [Kiryat Tivon, IL; Frachtenberg, Eitan [Jerusalem, IL; Gal, Yoav [Haifa, IL; Shmueli, Edi [Haifa, IL; Stockmeyer, legal representative, Robert E.; Stockmeyer, Larry Joseph [San Jose, CA

2008-07-15

A method for wiring allocation and switch configuration in a multiprocessor computer, the method including employing depth-first tree traversal to determine a plurality of paths among a plurality of processing elements allocated to a job along a plurality of switches and wires in a plurality of D-lines, and selecting one of the paths in accordance with at least one selection criterion.
Generic Software for Emulating Multiprocessor Architectures.

DTIC Science & Technology

1985-05-01

RD-A157 662 GENERIC SOFTWARE FOR EMULATING MULTIPROCESSOR 1/2 AlRCHITECTURES(J) MASSACHUSETTS INST OF TECH CAMBRIDGE U LRS LAB FOR COMPUTER SCIENCE R...AREA & WORK UNIT NUMBERS MIT Laboratory for Computer Science 545 Technology Square Cambridge, MA 02139 ____________ I I. CONTROLLING OFFICE NAME AND...aide If neceeasy end Identify by block number) Computer architecture, emulation, simulation, dataf low 20. ABSTRACT (Continue an reverse slde It
Adaptive Backoff Synchronization Techniques

DTIC Science & Technology

1989-07-01

The Simple Code. Technical Report, Lawrence Livermore Laboratory, February 1978. [6] F. Darems-Rogers, D. A. George, V. A. Norton, and G . F. Pfister...Heights, November 1986. 20 [7] Daniel Gajski , David Kuck, Duncan Lawrie, and Ahmed Saleh. Cedar - A Large Scale Multiprocessor. In International...17] Janak H. Patel. Analysis of Multiprocessors with Private Cache Memories. IEEE Transactions on Com- puters, C-31(4):296-304, April 1982. [18] G
Framework for analysis of guaranteed QOS systems

NASA Astrophysics Data System (ADS)

Chaudhry, Shailender; Choudhary, Alok

1997-01-01

Multimedia data is isochronous in nature and entails managing and delivering high volumes of data. Multiprocessors with their large processing power, vast memory, and fast interconnects, are an ideal candidate for the implementation of multimedia applications. Initially, multiprocessors were designed to execute scientific programs and thus their architecture was optimized to provide low message latency and efficiently support regular communication patterns. Hence, they have a regular network topology and most use wormhole routing. The design offers the benefits of a simple router, small buffer size, and network latency that is almost independent of path length. Among the various multimedia applications, video on demand (VOD) server is well-suited for implementation using parallel multiprocessors. Logical models for VOD servers are presently mapped onto multiprocessors. Our paper provides a framework for calculating bounds on utilization of system resources with which QoS parameters for each isochronous stream can be guaranteed. Effects of the architecture of multiprocessors, and efficiency of various local models and mapping on particular architectures can be investigated within our framework. Our framework is based on rigorous proofs and provides tight bounds. The results obtained may be used as the basis for admission control tests. To illustrate the versatility of our framework, we provide bounds on utilization for various logical models applied to mesh connected architectures for a video on demand server. Our results show that worm hole routing can lead to packets waiting for transmission of other packets that apparently share no common resources. This situation is analogous to head-of-the-line blocking. We find that the provision of multiple VCs per link and multiple flit buffers improves utilization (even under guaranteed QoS parameters). This analogous to parallel iterative matching.
Parallelising a molecular dynamics algorithm on a multi-processor workstation

NASA Astrophysics Data System (ADS)

Müller-Plathe, Florian

1990-12-01

The Verlet neighbour-list algorithm is parallelised for a multi-processor Hewlett-Packard/Apollo DN10000 workstation. The implementation makes use of memory shared between the processors. It is a genuine master-slave approach by which most of the computational tasks are kept in the master process and the slaves are only called to do part of the nonbonded forces calculation. The implementation features elements of both fine-grain and coarse-grain parallelism. Apart from three calls to library routines, two of which are standard UNIX calls, and two machine-specific language extensions, the whole code is written in standard Fortran 77. Hence, it may be expected that this parallelisation concept can be transfered in parts or as a whole to other multi-processor shared-memory computers. The parallel code is routinely used in production work.
Satisfiability Test with Synchronous Simulated Annealing on the Fujitsu AP1000 Massively-Parallel Multiprocessor

NASA Technical Reports Server (NTRS)

Sohn, Andrew; Biswas, Rupak

1996-01-01

Solving the hard Satisfiability Problem is time consuming even for modest-sized problem instances. Solving the Random L-SAT Problem is especially difficult due to the ratio of clauses to variables. This report presents a parallel synchronous simulated annealing method for solving the Random L-SAT Problem on a large-scale distributed-memory multiprocessor. In particular, we use a parallel synchronous simulated annealing procedure, called Generalized Speculative Computation, which guarantees the same decision sequence as sequential simulated annealing. To demonstrate the performance of the parallel method, we have selected problem instances varying in size from 100-variables/425-clauses to 5000-variables/21,250-clauses. Experimental results on the AP1000 multiprocessor indicate that our approach can satisfy 99.9 percent of the clauses while giving almost a 70-fold speedup on 500 processors.
Scheduling for Locality in Shared-Memory Multiprocessors

DTIC Science & Technology

1993-05-01

Submitted in Partial Fulfillment of the Requirements for the Degree ’)iIC Q(JALfryT INSPECTED 5 DOCTOR OF PHILOSOPHY I Accesion For Supervised by NTIS CRAM... architecture on parallel program performance, explain the implications of this trend on popular parallel programming models, and propose system software to 0...decomoosition and scheduling algorithms. I. SUIUECT TERMS IS. NUMBER OF PAGES shared-memory multiprocessors; architecture trends; loop 110 scheduling
The MIT Alewife Machine: A Large-Scale Distributed-Memory Multiprocessor

DTIC Science & Technology

1991-06-01

Symposium on Compiler Construction, June 1986. [14] Daniel Gajski , David Kuck, Duncan Lawrie, and Ahmed Saleh. Cedar - A Large Scale Multiprocessor. In...Directory Methods. In Proceedings 17th Annual International Symposium on Computer Architecture, June 1990. [31] G . M. Papadopoulos and D.E. Culler...Monsoon: An Explicit Token-Store Ar- chitecture. In Proceedings 17th Annual International Symposium on Computer Architecture, June 1990. [32] G . F
Adaptive Backoff Synchronization Techniques

DTIC Science & Technology

1989-06-01

The Simple Code. Technical Report, Lawrence Livermore Laboratory, February 1978. [6J F. Darems-Rogers, D. A. George, V. A. Norton, and G . F. Pfister...Heights, November 1986. 20 [7] Daniel Gajski , David Kuck, Duncan Lawrie, and Ahmed Saleh. Cedar - A Large Scale Multiprocessor. In International Conference...17] Janak H. Patel. Analysis of Multiprocessors with Private Cache Memories. IEEE Transactions on Com- puters, C-31(4):296-304, April 1982. [18] G
Operating system for a real-time multiprocessor propulsion system simulator

NASA Technical Reports Server (NTRS)

Cole, G. L.

1984-01-01

The success of the Real Time Multiprocessor Operating System (RTMPOS) in the development and evaluation of experimental hardware and software systems for real time interactive simulation of air breathing propulsion systems was evaluated. The Real Time Multiprocessor Operating System (RTMPOS) provides the user with a versatile, interactive means for loading, running, debugging and obtaining results from a multiprocessor based simulator. A front end processor (FEP) serves as the simulator controller and interface between the user and the simulator. These functions are facilitated by the RTMPOS which resides on the FEP. The RTMPOS acts in conjunction with the FEP's manufacturer supplied disk operating system that provides typical utilities like an assembler, linkage editor, text editor, file handling services, etc. Once a simulation is formulated, the RTMPOS provides for engineering level, run time operations such as loading, modifying and specifying computation flow of programs, simulator mode control, data handling and run time monitoring. Run time monitoring is a powerful feature of RTMPOS that allows the user to record all actions taken during a simulation session and to receive advisories from the simulator via the FEP. The RTMPOS is programmed mainly in PASCAL along with some assembly language routines. The RTMPOS software is easily modified to be applicable to hardware from different manufacturers.
The FORCE - A highly portable parallel programming language

NASA Technical Reports Server (NTRS)

Jordan, Harry F.; Benten, Muhammad S.; Alaghband, Gita; Jakob, Ruediger

1989-01-01

This paper explains why the FORCE parallel programming language is easily portable among six different shared-memory multiprocessors, and how a two-level macro preprocessor makes it possible to hide low-level machine dependencies and to build machine-independent high-level constructs on top of them. These FORCE constructs make it possible to write portable parallel programs largely independent of the number of processes and the specific shared-memory multiprocessor executing them.
Reducing Response Time Bounds for DAG-Based Task Systems on Heterogeneous Multicore Platforms

DTIC Science & Technology

2016-01-01

synchronous parallel tasks on multicore platforms. In 25th ECRTS, 2013. [10] U. Devi. Soft Real - Time Scheduling on Multiprocessors. PhD thesis...report, Washington University in St Louis, 2014. [18] C. Liu and J. Anderson. Supporting soft real - time DAG-based sys- tems on multiprocessors with...analysis for DAG-based real - time task systems im- plemented on heterogeneous multicore platforms. The spe- cific analysis problem that is considered was
Graphics applications utilizing parallel processing

NASA Technical Reports Server (NTRS)

Rice, John R.

1990-01-01

The results are presented of research conducted to develop a parallel graphic application algorithm to depict the numerical solution of the 1-D wave equation, the vibrating string. The research was conducted on a Flexible Flex/32 multiprocessor and a Sequent Balance 21000 multiprocessor. The wave equation is implemented using the finite difference method. The synchronization issues that arose from the parallel implementation and the strategies used to alleviate the effects of the synchronization overhead are discussed.
Dynamic file-access characteristics of a production parallel scientific workload

NASA Technical Reports Server (NTRS)

Kotz, David; Nieuwejaar, Nils

1994-01-01

Multiprocessors have permitted astounding increases in computational performance, but many cannot meet the intense I/O requirements of some scientific applications. An important component of any solution to this I/O bottleneck is a parallel file system that can provide high-bandwidth access to tremendous amounts of data in parallel to hundreds or thousands of processors. Most successful systems are based on a solid understanding of the expected workload, but thus far there have been no comprehensive workload characterizations of multiprocessor file systems. This paper presents the results of a three week tracing study in which all file-related activity on a massively parallel computer was recorded. Our instrumentation differs from previous efforts in that it collects information about every I/O request and about the mix of jobs running in a production environment. We also present the results of a trace-driven caching simulation and recommendations for designers of multiprocessor file systems.
Safe and Efficient Support for Embeded Multi-Processors in ADA

NASA Astrophysics Data System (ADS)

Ruiz, Jose F.

2010-08-01

New software demands increasing processing power, and multi-processor platforms are spreading as the answer to achieve the required performance. Embedded real-time systems are also subject to this trend, but in the case of real-time mission-critical systems, the properties of reliability, predictability and analyzability are also paramount. The Ada 2005 language defined a subset of its tasking model, the Ravenscar profile, that provides the basis for the implementation of deterministic and time analyzable applications on top of a streamlined run-time system. This Ravenscar tasking profile, originally designed for single processors, has proven remarkably useful for modelling verifiable real-time single-processor systems. This paper proposes a simple extension to the Ravenscar profile to support multi-processor systems using a fully partitioned approach. The implementation of this scheme is simple, and it can be used to develop applications amenable to schedulability analysis.
Power-Aware Compiler Controllable Chip Multiprocessor

NASA Astrophysics Data System (ADS)

Shikano, Hiroaki; Shirako, Jun; Wada, Yasutaka; Kimura, Keiji; Kasahara, Hironori

A power-aware compiler controllable chip multiprocessor (CMP) is presented and its performance and power consumption are evaluated with the optimally scheduled advanced multiprocessor (OSCAR) parallelizing compiler. The CMP is equipped with power control registers that change clock frequency and power supply voltage to functional units including processor cores, memories, and an interconnection network. The OSCAR compiler carries out coarse-grain task parallelization of programs and reduces power consumption using architectural power control support and the compiler's power saving scheme. The performance evaluation shows that MPEG-2 encoding on the proposed CMP with four CPUs results in 82.6% power reduction in real-time execution mode with a deadline constraint on its sequential execution time. Furthermore, MP3 encoding on a heterogeneous CMP with four CPUs and four accelerators results in 53.9% power reduction at 21.1-fold speed-up in performance against its sequential execution in the fastest execution mode.
A Multiprocessor Operating System Simulator

NASA Technical Reports Server (NTRS)

Johnston, Gary M.; Campbell, Roy H.

1988-01-01

This paper describes a multiprocessor operating system simulator that was developed by the authors in the Fall semester of 1987. The simulator was built in response to the need to provide students with an environment in which to build and test operating system concepts as part of the coursework of a third-year undergraduate operating systems course. Written in C++, the simulator uses the co-routine style task package that is distributed with the AT&T C++ Translator to provide a hierarchy of classes that represents a broad range of operating system software and hardware components. The class hierarchy closely follows that of the 'Choices' family of operating systems for loosely- and tightly-coupled multiprocessors. During an operating system course, these classes are refined and specialized by students in homework assignments to facilitate experimentation with different aspects of operating system design and policy decisions. The current implementation runs on the IBM RT PC under 4.3bsd UNIX.
Study of Thread Level Parallelism in a Video Encoding Application for Chip Multiprocessor Design

NASA Astrophysics Data System (ADS)

Debes, Eric; Kaine, Greg

2002-11-01

In media applications there is a high level of available thread level parallelism (TLP). In this paper we study the intra TLP in a video encoder. We show that a well-distributed highly optimized encoder running on a symmetric multiprocessor (SMP) system can run 3.2 faster on a 4-way SMP machine than on a single processor. The multithreaded encoder running on an SMP system is then used to understand the requirements of a chip multiprocessor (CMP) architecture, which is one possible architectural direction to better exploit TLP. In the framework of this study, we use a software approach to evaluate the dataflow between processors for the video encoder running on an SMP system. An estimation of the dataflow is done with L2 cache miss event counters using Intel® VTuneTM performance analyzer. The experimental measurements are compared to theoretical results.

Cache as point of coherence in multiprocessor system

DOEpatents

Blumrich, Matthias A.; Ceze, Luis H.; Chen, Dong; Gara, Alan; Heidelberger, Phlip; Ohmacht, Martin; Steinmacher-Burow, Burkhard; Zhuang, Xiaotong

2016-11-29

In a multiprocessor system, a conflict checking mechanism is implemented in the L2 cache memory. Different versions of speculative writes are maintained in different ways of the cache. A record of speculative writes is maintained in the cache directory. Conflict checking occurs as part of directory lookup. Speculative versions that do not conflict are aggregated into an aggregated version in a different way of the cache. Speculative memory access requests do not go to main memory.
Multiprocessor Real-Time Locking Protocols for Replicated Resources

DTIC Science & Technology

2016-07-01

circular buffer of slots, each representing a discrete segment of time . For example, if the maintenance of a timing wheel occurs af- ter an interrupt ...Experimental Evaluation To evaluate Algs. 2, 3, and 4, we conducted a series of ex- periments in which we measured relevant overheads and blocking times . We...Multiprocessor Real- Time Locking Protocols for Replicated Resources ∗ Catherine E. Jarrett1, Kecheng Yang1, Ming Yang1, Pontus Ekberg2, and James H
Cedar-a large scale multiprocessor

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gajski, D.; Kuck, D.; Lawrie, D.

1983-01-01

This paper presents an overview of Cedar, a large scale multiprocessor being designed at the University of Illinois. This machine is designed to accommodate several thousand high performance processors which are capable of working together on a single job, or they can be partitioned into groups of processors where each group of one or more processors can work on separate jobs. Various aspects of the machine are described including the control methodology, communication network, optimizing compiler and plans for construction. 13 references.
The force on the flex: Global parallelism and portability

NASA Technical Reports Server (NTRS)

Jordan, H. F.

1986-01-01

A parallel programming methodology, called the force, supports the construction of programs to be executed in parallel by an unspecified, but potentially large, number of processes. The methodology was originally developed on a pipelined, shared memory multiprocessor, the Denelcor HEP, and embodies the primitive operations of the force in a set of macros which expand into multiprocessor Fortran code. A small set of primitives is sufficient to write large parallel programs, and the system has been used to produce 10,000 line programs in computational fluid dynamics. The level of complexity of the force primitives is intermediate. It is high enough to mask detailed architectural differences between multiprocessors but low enough to give the user control over performance. The system is being ported to a medium scale multiprocessor, the Flex/32, which is a 20 processor system with a mixture of shared and local memory. Memory organization and the type of processor synchronization supported by the hardware on the two machines lead to some differences in efficient implementations of the force primitives, but the user interface remains the same. An initial implementation was done by retargeting the macros to Flexible Computer Corporation's ConCurrent C language. Subsequently, the macros were caused to directly produce the system calls which form the basis for ConCurrent C. The implementation of the Fortran based system is in step with Flexible Computer Corporations's implementation of a Fortran system in the parallel environment.
Backend Control Processor for a Multi-Processor Relational Database Computer System.

DTIC Science & Technology

1984-12-01

SCHOOL OF ENGI. UNCRSIFID MPONTIFF DEC 84 AFXT/GCS/ENG/84D-22 F/O 9/2 L ommhhhhmhhml mhhhommhhhhhm i-2 8 -- U0. 11111= Q. 2 111.8IIII- 1111111..6...THESIS Presented to the Faculty of the School of Engineering of the Air Force Institute of Technology Air University In Partial Fulfillment of the...development of a Backend Multi-Processor Relational Database Computer System. This thesis addresses a single component of this system, the Backend Control
Visual monitoring of autonomous life sciences experimentation

NASA Technical Reports Server (NTRS)

Blank, G. E.; Martin, W. N.

1987-01-01

The design and implementation of a computerized visual monitoring system to aid in the monitoring and control of life sciences experiments on board a space station was investigated. A likely multiprocessor design was chosen, a plausible life science experiment with which to work was defined, the theoretical issues involved in the programming of a visual monitoring system for the experiment was considered on the multiprocessor, a system for monitoring the experiment was designed, and simulations of such a system was implemented on a network of Apollo workstations.
A CAMAC-VME-Macintosh data acquisition system for nuclear experiments

NASA Astrophysics Data System (ADS)

Anzalone, A.; Giustolisi, F.

1989-10-01

A multiprocessor system for data acquisition and analysis in low-energy nuclear physics has been realized. The system is built around CAMAC, the VMEbus, and the Macintosh PC. Multiprocessor software has been developed, using RTF, MACsys, and CERN cross-software. The execution of several programs that run on several VME CPUs and on an external PC is coordinated by a mailbox protocol. No operating system is used on the VME CPUs. The hardware, software, and system performance are described.
Performance Evaluation of Parallel Algorithms and Architectures in Concurrent Multiprocessor Systems

DTIC Science & Technology

1988-09-01

HEP and Other Parallel processors, Report no. ANL-83-97, Argonne National Laboratory, Argonne, Ill. 1983. [19] Davidson, G . S. A Practical Paradigm for...IEEE Comp. Soc., 1986. [241 Peir, Jih-kwon, and D. Gajski , "CAMP: A Programming Aide For Multiprocessors," Proc. 1986 ICPP, IEEE Comp. Soc., pp475...482. [251 Pfister, G . F., and V. A. Norton, "Hot Spot Contention and Combining in Multistage Interconnection Networks,"IEEE Trans. Comp., C-34, Oct
Abnormal fault-recovery characteristics of the fault-tolerant multiprocessor uncovered using a new fault-injection methodology

NASA Technical Reports Server (NTRS)

Padilla, Peter A.

1991-01-01

An investigation was made in AIRLAB of the fault handling performance of the Fault Tolerant MultiProcessor (FTMP). Fault handling errors detected during fault injection experiments were characterized. In these fault injection experiments, the FTMP disabled a working unit instead of the faulted unit once in every 500 faults, on the average. System design weaknesses allow active faults to exercise a part of the fault management software that handles Byzantine or lying faults. Byzantine faults behave such that the faulted unit points to a working unit as the source of errors. The design's problems involve: (1) the design and interface between the simplex error detection hardware and the error processing software, (2) the functional capabilities of the FTMP system bus, and (3) the communication requirements of a multiprocessor architecture. These weak areas in the FTMP's design increase the probability that, for any hardware fault, a good line replacement unit (LRU) is mistakenly disabled by the fault management software.
A multiprocessor operating system simulator

DOE Office of Scientific and Technical Information (OSTI.GOV)

Johnston, G.M.; Campbell, R.H.

1988-01-01

This paper describes a multiprocessor operating system simulator that was developed by the authors in the Fall of 1987. The simulator was built in response to the need to provide students with an environment in which to build and test operating system concepts as part of the coursework of a third-year undergraduate operating systems course. Written in C++, the simulator uses the co-routine style task package that is distributed with the AT and T C++ Translator to provide a hierarchy of classes that represents a broad range of operating system software and hardware components. The class hierarchy closely follows thatmore » of the Choices family of operating systems for loosely and tightly coupled multiprocessors. During an operating system course, these classes are refined and specialized by students in homework assignments to facilitate experimentation with different aspects of operating system design and policy decisions. The current implementation runs on the IBM RT PC under 4.3bsd UNIX.« less
Solution of large nonlinear quasistatic structural mechanics problems on distributed-memory multiprocessor computers

DOE Office of Scientific and Technical Information (OSTI.GOV)

Blanford, M.

1997-12-31

Most commercially-available quasistatic finite element programs assemble element stiffnesses into a global stiffness matrix, then use a direct linear equation solver to obtain nodal displacements. However, for large problems (greater than a few hundred thousand degrees of freedom), the memory size and computation time required for this approach becomes prohibitive. Moreover, direct solution does not lend itself to the parallel processing needed for today`s multiprocessor systems. This talk gives an overview of the iterative solution strategy of JAS3D, the nonlinear large-deformation quasistatic finite element program. Because its architecture is derived from an explicit transient-dynamics code, it does not ever assemblemore » a global stiffness matrix. The author describes the approach he used to implement the solver on multiprocessor computers, and shows examples of problems run on hundreds of processors and more than a million degrees of freedom. Finally, he describes some of the work he is presently doing to address the challenges of iterative convergence for ill-conditioned problems.« less
A Low-Cost and Energy-Efficient Multiprocessor System-on-Chip for UWB MAC Layer

NASA Astrophysics Data System (ADS)

Xiao, Hao; Isshiki, Tsuyoshi; Khan, Arif Ullah; Li, Dongju; Kunieda, Hiroaki; Nakase, Yuko; Kimura, Sadahiro

Ultra-wideband (UWB) technology has attracted much attention recently due to its high data rate and low emission power. Its media access control (MAC) protocol, WiMedia MAC, promises a lot of facilities for high-speed and high-quality wireless communication. However, these benefits in turn involve a large amount of computational load, which challenges the traditional uniprocessor architecture based implementation method to provide the required performance. However, the constrained cost and power budget, on the other hand, makes using commercial multiprocessor solutions unrealistic. In this paper, a low-cost and energy-efficient multiprocessor system-on-chip (MPSoC), which tackles at once the aspects of system design, software migration and hardware architecture, is presented for the implementation of UWB MAC layer. Experimental results show that the proposed MPSoC, based on four simple RISC processors and shared-memory infrastructure, achieves up to 45% performance improvement and 65% power saving, but takes 15% less area than the uniprocessor implementation.
Analysis of a Multiprocessor Guidance Computer. Ph.D. Thesis

NASA Technical Reports Server (NTRS)

Maltach, E. G.

1969-01-01

The design of the next generation of spaceborne digital computers is described. It analyzes a possible multiprocessor computer configuration. For the analysis, a set of representative space computing tasks was abstracted from the Lunar Module Guidance Computer programs as executed during the lunar landing, from the Apollo program. This computer performs at this time about 24 concurrent functions, with iteration rates from 10 times per second to once every two seconds. These jobs were tabulated in a machine-independent form, and statistics of the overall job set were obtained. It was concluded, based on a comparison of simulation and Markov results, that the Markov process analysis is accurate in predicting overall trends and in configuration comparisons, but does not provide useful detailed information in specific situations. Using both types of analysis, it was determined that the job scheduling function is a critical one for efficiency of the multiprocessor. It is recommended that research into the area of automatic job scheduling be performed.
A language comparison for scientific computing on MIMD architectures

NASA Technical Reports Server (NTRS)

Jones, Mark T.; Patrick, Merrell L.; Voigt, Robert G.

1989-01-01

Choleski's method for solving banded symmetric, positive definite systems is implemented on a multiprocessor computer using three FORTRAN based parallel programming languages, the Force, PISCES and Concurrent FORTRAN. The capabilities of the language for expressing parallelism and their user friendliness are discussed, including readability of the code, debugging assistance offered, and expressiveness of the languages. The performance of the different implementations is compared. It is argued that PISCES, using the Force for medium-grained parallelism, is the appropriate choice for programming Choleski's method on the multiprocessor computer, Flex/32.
The MSFC UNIVAC 1108 EXEC 8 simulation model

NASA Technical Reports Server (NTRS)

Williams, T. G.; Richards, F. M.; Weatherbee, J. E.; Paul, L. K.

1972-01-01

A model is presented which simulates the MSFC Univac 1108 multiprocessor system. The hardware/operating system is described to enable a good statistical measurement of the system behavior. The performance of the 1108 is evaluated by performing twenty-four different experiments designed to locate system bottlenecks and also to test the sensitivity of system throughput with respect to perturbation of the various Exec 8 scheduling algorithms. The model is implemented in the general purpose system simulation language and the techniques described can be used to assist in the design, development, and evaluation of multiprocessor systems.
Generation-based memory synchronization in a multiprocessor system with weakly consistent memory accesses

DOEpatents

Ohmacht, Martin

2017-08-15

In a multiprocessor system, a central memory synchronization module coordinates memory synchronization requests responsive to memory access requests in flight, a generation counter, and a reclaim pointer. The central module communicates via point-to-point communication. The module includes a global OR reduce tree for each memory access requesting device, for detecting memory access requests in flight. An interface unit is implemented associated with each processor requesting synchronization. The interface unit includes multiple generation completion detectors. The generation count and reclaim pointer do not pass one another.
Generation-based memory synchronization in a multiprocessor system with weakly consistent memory accesses

DOEpatents

Ohmacht, Martin

2014-09-09

In a multiprocessor system, a central memory synchronization module coordinates memory synchronization requests responsive to memory access requests in flight, a generation counter, and a reclaim pointer. The central module communicates via point-to-point communication. The module includes a global OR reduce tree for each memory access requesting device, for detecting memory access requests in flight. An interface unit is implemented associated with each processor requesting synchronization. The interface unit includes multiple generation completion detectors. The generation count and reclaim pointer do not pass one another.
Multiprocessor system with multiple concurrent modes of execution

DOEpatents

Ahn, Daniel; Ceze, Luis H; Chen, Dong; Gara, Alan; Heidelberger, Philip; Ohmacht, Martin

2013-12-31

A multiprocessor system supports multiple concurrent modes of speculative execution. Speculation identification numbers (IDs) are allocated to speculative threads from a pool of available numbers. The pool is divided into domains, with each domain being assigned to a mode of speculation. Modes of speculation include TM, TLS, and rollback. Allocation of the IDs is carried out with respect to a central state table and using hardware pointers. The IDs are used for writing different versions of speculative results in different ways of a set in a cache memory.
Multiprocessor system with multiple concurrent modes of execution

DOEpatents

Ahn, Daniel; Ceze, Luis H.; Chen, Dong Chen; Gara, Alan; Heidelberger, Philip; Ohmacht, Martin

2016-11-22

A multiprocessor system supports multiple concurrent modes of speculative execution. Speculation identification numbers (IDs) are allocated to speculative threads from a pool of available numbers. The pool is divided into domains, with each domain being assigned to a mode of speculation. Modes of speculation include TM, TLS, and rollback. Allocation of the IDs is carried out with respect to a central state table and using hardware pointers. The IDs are used for writing different versions of speculative results in different ways of a set in a cache memory.
Development and evaluation of a Fault-Tolerant Multiprocessor (FTMP) computer. Volume 3: FTMP test and evaluation

NASA Technical Reports Server (NTRS)

Lala, J. H.; Smith, T. B., III

1983-01-01

The experimental test and evaluation of the Fault-Tolerant Multiprocessor (FTMP) is described. Major objectives of this exercise include expanding validation envelope, building confidence in the system, revealing any weaknesses in the architectural concepts and in their execution in hardware and software, and in general, stressing the hardware and software. To this end, pin-level faults were injected into one LRU of the FTMP and the FTMP response was measured in terms of fault detection, isolation, and recovery times. A total of 21,055 stuck-at-0, stuck-at-1 and invert-signal faults were injected in the CPU, memory, bus interface circuits, Bus Guardian Units, and voters and error latches. Of these, 17,418 were detected. At least 80 percent of undetected faults are estimated to be on unused pins. The multiprocessor identified all detected faults correctly and recovered successfully in each case. Total recovery time for all faults averaged a little over one second. This can be reduced to half a second by including appropriate self-tests.

Conference on Real-Time Computer Applications in Nuclear, Particle and Plasma Physics, 6th, Williamsburg, VA, May 15-19, 1989, Proceedings

NASA Technical Reports Server (NTRS)

Pordes, Ruth (Editor)

1989-01-01

Papers on real-time computer applications in nuclear, particle, and plasma physics are presented, covering topics such as expert systems tactics in testing FASTBUS segment interconnect modules, trigger control in a high energy physcis experiment, the FASTBUS read-out system for the Aleph time projection chamber, a multiprocessor data acquisition systems, DAQ software architecture for Aleph, a VME multiprocessor system for plasma control at the JT-60 upgrade, and a multiasking, multisinked, multiprocessor data acquisition front end. Other topics include real-time data reduction using a microVAX processor, a transputer based coprocessor for VEDAS, simulation of a macropipelined multi-CPU event processor for use in FASTBUS, a distributed VME control system for the LISA superconducting Linac, a distributed system for laboratory process automation, and a distributed system for laboratory process automation. Additional topics include a structure macro assembler for the event handler, a data acquisition and control system for Thomson scattering on ATF, remote procedure execution software for distributed systems, and a PC-based graphic display real-time particle beam uniformity.
Performances of multiprocessor multidisk architectures for continuous media storage

NASA Astrophysics Data System (ADS)

Gennart, Benoit A.; Messerli, Vincent; Hersch, Roger D.

1996-03-01

Multimedia interfaces increase the need for large image databases, capable of storing and reading streams of data with strict synchronicity and isochronicity requirements. In order to fulfill these requirements, we consider a parallel image server architecture which relies on arrays of intelligent disk nodes, each disk node being composed of one processor and one or more disks. This contribution analyzes through bottleneck performance evaluation and simulation the behavior of two multi-processor multi-disk architectures: a point-to-point architecture and a shared-bus architecture similar to current multiprocessor workstation architectures. We compare the two architectures on the basis of two multimedia algorithms: the compute-bound frame resizing by resampling and the data-bound disk-to-client stream transfer. The results suggest that the shared bus is a potential bottleneck despite its very high hardware throughput (400Mbytes/s) and that an architecture with addressable local memories located closely to their respective processors could partially remove this bottleneck. The point- to-point architecture is scalable and able to sustain high throughputs for simultaneous compute- bound and data-bound operations.
Communication Studies of DMP and SMP Machines

NASA Technical Reports Server (NTRS)

Sohn, Andrew; Biswas, Rupak; Chancellor, Marisa K. (Technical Monitor)

1997-01-01

Understanding the interplay between machines and problems is key to obtaining high performance on parallel machines. This paper investigates the interplay between programming paradigms and communication capabilities of parallel machines. In particular, we explicate the communication capabilities of the IBM SP-2 distributed-memory multiprocessor and the SGI PowerCHALLENGEarray symmetric multiprocessor. Two benchmark problems of bitonic sorting and Fast Fourier Transform are selected for experiments. Communication-efficient algorithms are developed to exploit the overlapping capabilities of the machines. Programs are written in Message-Passing Interface for portability and identical codes are used for both machines. Various data sizes and message sizes are used to test the machines' communication capabilities. Experimental results indicate that the communication performance of the multiprocessors are consistent with the size of messages. The SP-2 is sensitive to message size but yields a much higher communication overlapping because of the communication co-processor. The PowerCHALLENGEarray is not highly sensitive to message size and yields a low communication overlapping. Bitonic sorting yields lower performance compared to FFT due to a smaller computation-to-communication ratio.
A multiprocessor airborne lidar data system

NASA Technical Reports Server (NTRS)

Wright, C. W.; Bailey, S. A.; Heath, G. E.; Piazza, C. R.

1988-01-01

A new multiprocessor data acquisition system was developed for the existing Airborne Oceanographic Lidar (AOL). This implementation simultaneously utilizes five single board 68010 microcomputers, the UNIX system V operating system, and the real time executive VRTX. The original data acquisition system was implemented on a Hewlett Packard HP 21-MX 16 bit minicomputer using a multi-tasking real time operating system and a mixture of assembly and FORTRAN languages. The present collection of data sources produce data at widely varied rates and require varied amounts of burdensome real time processing and formatting. It was decided to replace the aging HP 21-MX minicomputer with a multiprocessor system. A new and flexible recording format was devised and implemented to accommodate the constantly changing sensor configuration. A central feature of this data system is the minimization of non-remote sensing bus traffic. Therefore, it is highly desirable that each micro be capable of functioning as much as possible on-card or via private peripherals. The bus is used primarily for the transfer of remote sensing data to or from the buffer queue.
Experience with a Genetic Algorithm Implemented on a Multiprocessor Computer

NASA Technical Reports Server (NTRS)

Plassman, Gerald E.; Sobieszczanski-Sobieski, Jaroslaw

2000-01-01

Numerical experiments were conducted to find out the extent to which a Genetic Algorithm (GA) may benefit from a multiprocessor implementation, considering, on one hand, that analyses of individual designs in a population are independent of each other so that they may be executed concurrently on separate processors, and, on the other hand, that there are some operations in a GA that cannot be so distributed. The algorithm experimented with was based on a gaussian distribution rather than bit exchange in the GA reproductive mechanism, and the test case was a hub frame structure of up to 1080 design variables. The experimentation engaging up to 128 processors confirmed expectations of radical elapsed time reductions comparing to a conventional single processor implementation. It also demonstrated that the time spent in the non-distributable parts of the algorithm and the attendant cross-processor communication may have a very detrimental effect on the efficient utilization of the multiprocessor machine and on the number of processors that can be used effectively in a concurrent manner. Three techniques were devised and tested to mitigate that effect, resulting in efficiency increasing to exceed 99 percent.
Multitasking kernel for the C and Fortran programming languages

DOE Office of Scientific and Technical Information (OSTI.GOV)

Brooks, E.D. III

1984-09-01

A multitasking kernel for the C and Fortran programming languages which runs on the Unix operating system is presented. The kernel provides a multitasking environment which serves two purposes. The first is to provide an efficient portable environment for the coding, debugging and execution of production multiprocessor programs. The second is to provide a means of evaluating the performance of a multitasking program on model multiprocessors. The performance evaluation features require no changes in the source code of the application and are implemented as a set of compile and run time options in the kernel.
Distributed parallel messaging for multiprocessor systems

DOEpatents

Chen, Dong; Heidelberger, Philip; Salapura, Valentina; Senger, Robert M; Steinmacher-Burrow, Burhard; Sugawara, Yutaka

2013-06-04

A method and apparatus for distributed parallel messaging in a parallel computing system. The apparatus includes, at each node of a multiprocessor network, multiple injection messaging engine units and reception messaging engine units, each implementing a DMA engine and each supporting both multiple packet injection into and multiple reception from a network, in parallel. The reception side of the messaging unit (MU) includes a switch interface enabling writing of data of a packet received from the network to the memory system. The transmission side of the messaging unit, includes switch interface for reading from the memory system when injecting packets into the network.
Programming parallel architectures: The BLAZE family of languages

NASA Technical Reports Server (NTRS)

Mehrotra, Piyush

1988-01-01

Programming multiprocessor architectures is a critical research issue. An overview is given of the various approaches to programming these architectures that are currently being explored. It is argued that two of these approaches, interactive programming environments and functional parallel languages, are particularly attractive since they remove much of the burden of exploiting parallel architectures from the user. Also described is recent work by the author in the design of parallel languages. Research on languages for both shared and nonshared memory multiprocessors is described, as well as the relations of this work to other current language research projects.
Automatic Data Partitioning on Distributed Memory Multiprocessors

DTIC Science & Technology

1990-10-01

DISTRIBUTED MEMORY MULTIPROCESSORS D 1"’ 1 C . Manish Gupta NOV,1 41990.NOV 1 41990m Prithviraj Banerjee D Coordinated Science Laboratory College of...developed on the partitioning of arrays can as well be applied to other programming languages, such as C . 3 The rest of this paper is organized as follows...value 1, as in Fortran. a) N= 4, N 2 = 1: f(i) = J,(j) = 03 b) =Ni 1, N 2 =4: fA(i) =, f() - c ) NI 2, X) 2: f()=[., f2(j) = [L-.j d) N 1 , N 2 =4: fA(i
Multiprocessor and memory architecture of the neurocomputer SYNAPSE-1.

PubMed

Ramacher, U; Raab, W; Anlauf, J; Hachmann, U; Beichter, J; Brüls, N; Wesseling, M; Sicheneder, E; Männer, R; Glass, J

1993-12-01

A general purpose neurocomputer, SYNAPSE-1, which exhibits a multiprocessor and memory architecture is presented. It offers wide flexibility with respect to neural algorithms and a speed-up factor of several orders of magnitude--including learning. The computational power is provided by a 2-dimensional systolic array of neural signal processors. Since the weights are stored outside these NSPs, memory size and processing power can be adapted individually to the application needs. A neural algorithms programming language, embedded in C(+2) has been defined for the user to cope with the neurocomputer. In a benchmark test, the prototype of SYNAPSE-1 was 8000 times as fast as a standard workstation.
An efficient 3-dim FFT for plane wave electronic structure calculations on massively parallel machines composed of multiprocessor nodes

NASA Astrophysics Data System (ADS)

Goedecker, Stefan; Boulet, Mireille; Deutsch, Thierry

2003-08-01

Three-dimensional Fast Fourier Transforms (FFTs) are the main computational task in plane wave electronic structure calculations. Obtaining a high performance on a large numbers of processors is non-trivial on the latest generation of parallel computers that consist of nodes made up of a shared memory multiprocessors. A non-dogmatic method for obtaining high performance for such 3-dim FFTs in a combined MPI/OpenMP programming paradigm will be presented. Exploiting the peculiarities of plane wave electronic structure calculations, speedups of up to 160 and speeds of up to 130 Gflops were obtained on 256 processors.
Automatic partitioning of unstructured meshes for the parallel solution of problems in computational mechanics

NASA Technical Reports Server (NTRS)

Farhat, Charbel; Lesoinne, Michel

1993-01-01

Most of the recently proposed computational methods for solving partial differential equations on multiprocessor architectures stem from the 'divide and conquer' paradigm and involve some form of domain decomposition. For those methods which also require grids of points or patches of elements, it is often necessary to explicitly partition the underlying mesh, especially when working with local memory parallel processors. In this paper, a family of cost-effective algorithms for the automatic partitioning of arbitrary two- and three-dimensional finite element and finite difference meshes is presented and discussed in view of a domain decomposed solution procedure and parallel processing. The influence of the algorithmic aspects of a solution method (implicit/explicit computations), and the architectural specifics of a multiprocessor (SIMD/MIMD, startup/transmission time), on the design of a mesh partitioning algorithm are discussed. The impact of the partitioning strategy on load balancing, operation count, operator conditioning, rate of convergence and processor mapping is also addressed. Finally, the proposed mesh decomposition algorithms are demonstrated with realistic examples of finite element, finite volume, and finite difference meshes associated with the parallel solution of solid and fluid mechanics problems on the iPSC/2 and iPSC/860 multiprocessors.
Parallel processing of real-time dynamic systems simulation on OSCAR (Optimally SCheduled Advanced multiprocessoR)

NASA Technical Reports Server (NTRS)

Kasahara, Hironori; Honda, Hiroki; Narita, Seinosuke

1989-01-01

Parallel processing of real-time dynamic systems simulation on a multiprocessor system named OSCAR is presented. In the simulation of dynamic systems, generally, the same calculation are repeated every time step. However, we cannot apply to Do-all or the Do-across techniques for parallel processing of the simulation since there exist data dependencies from the end of an iteration to the beginning of the next iteration and furthermore data-input and data-output are required every sampling time period. Therefore, parallelism inside the calculation required for a single time step, or a large basic block which consists of arithmetic assignment statements, must be used. In the proposed method, near fine grain tasks, each of which consists of one or more floating point operations, are generated to extract the parallelism from the calculation and assigned to processors by using optimal static scheduling at compile time in order to reduce large run time overhead caused by the use of near fine grain tasks. The practicality of the scheme is demonstrated on OSCAR (Optimally SCheduled Advanced multiprocessoR) which has been developed to extract advantageous features of static scheduling algorithms to the maximum extent.
Scalable Multiprocessor for High-Speed Computing in Space

NASA Technical Reports Server (NTRS)

Lux, James; Lang, Minh; Nishimoto, Kouji; Clark, Douglas; Stosic, Dorothy; Bachmann, Alex; Wilkinson, William; Steffke, Richard

2004-01-01

A report discusses the continuing development of a scalable multiprocessor computing system for hard real-time applications aboard a spacecraft. "Hard realtime applications" signifies applications, like real-time radar signal processing, in which the data to be processed are generated at "hundreds" of pulses per second, each pulse "requiring" millions of arithmetic operations. In these applications, the digital processors must be tightly integrated with analog instrumentation (e.g., radar equipment), and data input/output must be synchronized with analog instrumentation, controlled to within fractions of a microsecond. The scalable multiprocessor is a cluster of identical commercial-off-the-shelf generic DSP (digital-signal-processing) computers plus generic interface circuits, including analog-to-digital converters, all controlled by software. The processors are computers interconnected by high-speed serial links. Performance can be increased by adding hardware modules and correspondingly modifying the software. Work is distributed among the processors in a parallel or pipeline fashion by means of a flexible master/slave control and timing scheme. Each processor operates under its own local clock; synchronization is achieved by broadcasting master time signals to all the processors, which compute offsets between the master clock and their local clocks.
Process Management and Exception Handling in Multiprocessor Operating Systems Using Object-Oriented Design Techniques. Revised Sep. 1988

NASA Technical Reports Server (NTRS)

Russo, Vincent; Johnston, Gary; Campbell, Roy

1988-01-01

The programming of the interrupt handling mechanisms, process switching primitives, scheduling mechanism, and synchronization primitives of an operating system for a multiprocessor require both efficient code in order to support the needs of high- performance or real-time applications and careful organization to facilitate maintenance. Although many advantages have been claimed for object-oriented class hierarchical languages and their corresponding design methodologies, the application of these techniques to the design of the primitives within an operating system has not been widely demonstrated. To investigate the role of class hierarchical design in systems programming, the authors have constructed the Choices multiprocessor operating system architecture the C++ programming language. During the implementation, it was found that many operating system design concerns can be represented advantageously using a class hierarchical approach, including: the separation of mechanism and policy; the organization of an operating system into layers, each of which represents an abstract machine; and the notions of process and exception management. In this paper, we discuss an implementation of the low-level primitives of this system and outline the strategy by which we developed our solution.
Method for prefetching non-contiguous data structures

DOEpatents

Blumrich, Matthias A [Ridgefield, CT; Chen, Dong [Croton On Hudson, NY; Coteus, Paul W [Yorktown Heights, NY; Gara, Alan G [Mount Kisco, NY; Giampapa, Mark E [Irvington, NY; Heidelberger, Philip [Cortlandt Manor, NY; Hoenicke, Dirk [Ossining, NY; Ohmacht, Martin [Brewster, NY; Steinmacher-Burow, Burkhard D [Mount Kisco, NY; Takken, Todd E [Mount Kisco, NY; Vranas, Pavlos M [Bedford Hills, NY

2009-05-05

A low latency memory system access is provided in association with a weakly-ordered multiprocessor system. Each processor in the multiprocessor shares resources, and each shared resource has an associated lock within a locking device that provides support for synchronization between the multiple processors in the multiprocessor and the orderly sharing of the resources. A processor only has permission to access a resource when it owns the lock associated with that resource, and an attempt by a processor to own a lock requires only a single load operation, rather than a traditional atomic load followed by store, such that the processor only performs a read operation and the hardware locking device performs a subsequent write operation rather than the processor. A simple perfecting for non-contiguous data structures is also disclosed. A memory line is redefined so that in addition to the normal physical memory data, every line includes a pointer that is large enough to point to any other line in the memory, wherein the pointers to determine which memory line to prefect rather than some other predictive algorithm. This enables hardware to effectively prefect memory access patterns that are non-contiguous, but repetitive.
Low latency memory access and synchronization

DOE Office of Scientific and Technical Information (OSTI.GOV)

Blumrich, Matthias A.; Chen, Dong; Coteus, Paul W.

A low latency memory system access is provided in association with a weakly-ordered multiprocessor system. Each processor in the multiprocessor shares resources, and each shared resource has an associated lock within a locking device that provides support for synchronization between the multiple processors in the multiprocessor and the orderly sharing of the resources. A processor only has permission to access a resource when it owns the lock associated with that resource, and an attempt by a processor to own a lock requires only a single load operation, rather than a traditional atomic load followed by store, such that the processormore » only performs a read operation and the hardware locking device performs a subsequent write operation rather than the processor. A simple prefetching for non-contiguous data structures is also disclosed. A memory line is redefined so that in addition to the normal physical memory data, every line includes a pointer that is large enough to point to any other line in the memory, wherein the pointers to determine which memory line to prefetch rather than some other predictive algorithm. This enables hardware to effectively prefetch memory access patterns that are non-contiguous, but repetitive.« less
Low latency memory access and synchronization

DOE Office of Scientific and Technical Information (OSTI.GOV)

Blumrich, Matthias A.; Chen, Dong; Coteus, Paul W.

A low latency memory system access is provided in association with a weakly-ordered multiprocessor system. Bach processor in the multiprocessor shares resources, and each shared resource has an associated lock within a locking device that provides support for synchronization between the multiple processors in the multiprocessor and the orderly sharing of the resources. A processor only has permission to access a resource when it owns the lock associated with that resource, and an attempt by a processor to own a lock requires only a single load operation, rather than a traditional atomic load followed by store, such that the processormore » only performs a read operation and the hardware locking device performs a subsequent write operation rather than the processor. A simple prefetching for non-contiguous data structures is also disclosed. A memory line is redefined so that in addition to the normal physical memory data, every line includes a pointer that is large enough to point to any other line in the memory, wherein the pointers to determine which memory line to prefetch rather than some other predictive algorithm. This enables hardware to effectively prefetch memory access patterns that are non-contiguous, but repetitive.« less
Energy-efficient fault tolerance in multiprocessor real-time systems

NASA Astrophysics Data System (ADS)

Guo, Yifeng

The recent progress in the multiprocessor/multicore systems has important implications for real-time system design and operation. From vehicle navigation to space applications as well as industrial control systems, the trend is to deploy multiple processors in real-time systems: systems with 4 -- 8 processors are common, and it is expected that many-core systems with dozens of processing cores will be available in near future. For such systems, in addition to general temporal requirement common for all real-time systems, two additional operational objectives are seen as critical: energy efficiency and fault tolerance. An intriguing dimension of the problem is that energy efficiency and fault tolerance are typically conflicting objectives, due to the fact that tolerating faults (e.g., permanent/transient) often requires extra resources with high energy consumption potential. In this dissertation, various techniques for energy-efficient fault tolerance in multiprocessor real-time systems have been investigated. First, the Reliability-Aware Power Management (RAPM) framework, which can preserve the system reliability with respect to transient faults when Dynamic Voltage Scaling (DVS) is applied for energy savings, is extended to support parallel real-time applications with precedence constraints. Next, the traditional Standby-Sparing (SS) technique for dual processor systems, which takes both transient and permanent faults into consideration while saving energy, is generalized to support multiprocessor systems with arbitrary number of identical processors. Observing the inefficient usage of slack time in the SS technique, a Preference-Oriented Scheduling Framework is designed to address the problem where tasks are given preferences for being executed as soon as possible (ASAP) or as late as possible (ALAP). A preference-oriented earliest deadline (POED) scheduler is proposed and its application in multiprocessor systems for energy-efficient fault tolerance is investigated, where tasks' main copies are executed ASAP while backup copies ALAP to reduce the overlapped execution of main and backup copies of the same task and thus reduce energy consumption. All proposed techniques are evaluated through extensive simulations and compared with other state-of-the-art approaches. The simulation results confirm that the proposed schemes can preserve the system reliability while still achieving substantial energy savings. Finally, for both SS and POED based Energy-Efficient Fault-Tolerant (EEFT) schemes, a series of recovery strategies are designed when more than one (transient and permanent) faults need to be tolerated.
Performance issues for domain-oriented time-driven distributed simulations

NASA Technical Reports Server (NTRS)

Nicol, David M.

1987-01-01

It has long been recognized that simulations form an interesting and important class of computations that may benefit from distributed or parallel processing. Since the point of parallel processing is improved performance, the recent proliferation of multiprocessors requires that we consider the performance issues that naturally arise when attempting to implement a distributed simulation. Three such issues are: (1) the problem of mapping the simulation onto the architecture, (2) the possibilities for performing redundant computation in order to reduce communication, and (3) the avoidance of deadlock due to distributed contention for message-buffer space. These issues are discussed in the context of a battlefield simulation implemented on a medium-scale multiprocessor message-passing architecture.

MPF: A portable message passing facility for shared memory multiprocessors

NASA Technical Reports Server (NTRS)

Malony, Allen D.; Reed, Daniel A.; Mcguire, Patrick J.

1987-01-01

The design, implementation, and performance evaluation of a message passing facility (MPF) for shared memory multiprocessors are presented. The MPF is based on a message passing model conceptually similar to conversations. Participants (parallel processors) can enter or leave a conversation at any time. The message passing primitives for this model are implemented as a portable library of C function calls. The MPF is currently operational on a Sequent Balance 21000, and several parallel applications were developed and tested. Several simple benchmark programs are presented to establish interprocess communication performance for common patterns of interprocess communication. Finally, performance figures are presented for two parallel applications, linear systems solution, and iterative solution of partial differential equations.
Partitioning and packing mathematical simulation models for calculation on parallel computers

NASA Technical Reports Server (NTRS)

Arpasi, D. J.; Milner, E. J.

1986-01-01

The development of multiprocessor simulations from a serial set of ordinary differential equations describing a physical system is described. Degrees of parallelism (i.e., coupling between the equations) and their impact on parallel processing are discussed. The problem of identifying computational parallelism within sets of closely coupled equations that require the exchange of current values of variables is described. A technique is presented for identifying this parallelism and for partitioning the equations for parallel solution on a multiprocessor. An algorithm which packs the equations into a minimum number of processors is also described. The results of the packing algorithm when applied to a turbojet engine model are presented in terms of processor utilization.
Event parallelism: Distributed memory parallel computing for high energy physics experiments

NASA Astrophysics Data System (ADS)

Nash, Thomas

1989-12-01

This paper describes the present and expected future development of distributed memory parallel computers for high energy physics experiments. It covers the use of event parallel microprocessor farms, particularly at Fermilab, including both ACP multiprocessors and farms of MicroVAXES. These systems have proven very cost effective in the past. A case is made for moving to the more open environment of UNIX and RISC processors. The 2nd Generation ACP Multiprocessor System, which is based on powerful RISC system, is described. Given the promise of still more extraordinary increases in processor performance, a new emphasis on point to point, rather than bussed, communication will be required. Developments in this direction are described.
A multiarchitecture parallel-processing development environment

NASA Technical Reports Server (NTRS)

Townsend, Scott; Blech, Richard; Cole, Gary

1993-01-01

A description is given of the hardware and software of a multiprocessor test bed - the second generation Hypercluster system. The Hypercluster architecture consists of a standard hypercube distributed-memory topology, with multiprocessor shared-memory nodes. By using standard, off-the-shelf hardware, the system can be upgraded to use rapidly improving computer technology. The Hypercluster's multiarchitecture nature makes it suitable for researching parallel algorithms in computational field simulation applications (e.g., computational fluid dynamics). The dedicated test-bed environment of the Hypercluster and its custom-built software allows experiments with various parallel-processing concepts such as message passing algorithms, debugging tools, and computational 'steering'. Such research would be difficult, if not impossible, to achieve on shared, commercial systems.
Development and evaluation of a fault-tolerant multiprocessor (FTMP) computer. Volume 4: FTMP executive summary

NASA Technical Reports Server (NTRS)

Smith, T. B., III; Lala, J. H.

1984-01-01

The FTMP architecture is a high reliability computer concept modeled after a homogeneous multiprocessor architecture. Elements of the FTMP are operated in tight synchronism with one another and hardware fault-detection and fault-masking is provided which is transparent to the software. Operating system design and user software design is thus greatly simplified. Performance of the FTMP is also comparable to that of a simplex equivalent due to the efficiency of fault handling hardware. The FTMP project constructed an engineering module of the FTMP, programmed the machine and extensively tested the architecture through fault injection and other stress testing. This testing confirmed the soundness of the FTMP concepts.
A measurement-based performability model for a multiprocessor system

NASA Technical Reports Server (NTRS)

Ilsueh, M. C.; Iyer, Ravi K.; Trivedi, K. S.

1987-01-01

A measurement-based performability model based on real error-data collected on a multiprocessor system is described. Model development from the raw errror-data to the estimation of cumulative reward is described. Both normal and failure behavior of the system are characterized. The measured data show that the holding times in key operational and failure states are not simple exponential and that semi-Markov process is necessary to model the system behavior. A reward function, based on the service rate and the error rate in each state, is then defined in order to estimate the performability of the system and to depict the cost of different failure types and recovery procedures.
Instrumentation, performance visualization, and debugging tools for multiprocessors

NASA Technical Reports Server (NTRS)

Yan, Jerry C.; Fineman, Charles E.; Hontalas, Philip J.

1991-01-01

The need for computing power has forced a migration from serial computation on a single processor to parallel processing on multiprocessor architectures. However, without effective means to monitor (and visualize) program execution, debugging, and tuning parallel programs becomes intractably difficult as program complexity increases with the number of processors. Research on performance evaluation tools for multiprocessors is being carried out at ARC. Besides investigating new techniques for instrumenting, monitoring, and presenting the state of parallel program execution in a coherent and user-friendly manner, prototypes of software tools are being incorporated into the run-time environments of various hardware testbeds to evaluate their impact on user productivity. Our current tool set, the Ames Instrumentation Systems (AIMS), incorporates features from various software systems developed in academia and industry. The execution of FORTRAN programs on the Intel iPSC/860 can be automatically instrumented and monitored. Performance data collected in this manner can be displayed graphically on workstations supporting X-Windows. We have successfully compared various parallel algorithms for computational fluid dynamics (CFD) applications in collaboration with scientists from the Numerical Aerodynamic Simulation Systems Division. By performing these comparisons, we show that performance monitors and debuggers such as AIMS are practical and can illuminate the complex dynamics that occur within parallel programs.
Efficient partitioning and assignment on programs for multiprocessor execution

NASA Technical Reports Server (NTRS)

Standley, Hilda M.

1993-01-01

The general problem studied is that of segmenting or partitioning programs for distribution across a multiprocessor system. Efficient partitioning and the assignment of program elements are of great importance since the time consumed in this overhead activity may easily dominate the computation, effectively eliminating any gains made by the use of the parallelism. In this study, the partitioning of sequentially structured programs (written in FORTRAN) is evaluated. Heuristics, developed for similar applications are examined. Finally, a model for queueing networks with finite queues is developed which may be used to analyze multiprocessor system architectures with a shared memory approach to the problem of partitioning. The properties of sequentially written programs form obstacles to large scale (at the procedure or subroutine level) parallelization. Data dependencies of even the minutest nature, reflecting the sequential development of the program, severely limit parallelism. The design of heuristic algorithms is tied to the experience gained in the parallel splitting. Parallelism obtained through the physical separation of data has seen some success, especially at the data element level. Data parallelism on a grander scale requires models that accurately reflect the effects of blocking caused by finite queues. A model for the approximation of the performance of finite queueing networks is developed. This model makes use of the decomposition approach combined with the efficiency of product form solutions.
Portable programming on parallel/networked computers using the Application Portable Parallel Library (APPL)

NASA Technical Reports Server (NTRS)

Quealy, Angela; Cole, Gary L.; Blech, Richard A.

1993-01-01

The Application Portable Parallel Library (APPL) is a subroutine-based library of communication primitives that is callable from applications written in FORTRAN or C. APPL provides a consistent programmer interface to a variety of distributed and shared-memory multiprocessor MIMD machines. The objective of APPL is to minimize the effort required to move parallel applications from one machine to another, or to a network of homogeneous machines. APPL encompasses many of the message-passing primitives that are currently available on commercial multiprocessor systems. This paper describes APPL (version 2.3.1) and its usage, reports the status of the APPL project, and indicates possible directions for the future. Several applications using APPL are discussed, as well as performance and overhead results.
High-performance multiprocessor architecture for a 3-D lattice gas model

NASA Technical Reports Server (NTRS)

Lee, F.; Flynn, M.; Morf, M.

1991-01-01

The lattice gas method has recently emerged as a promising discrete particle simulation method in areas such as fluid dynamics. We present a very high-performance scalable multiprocessor architecture, called ALGE, proposed for the simulation of a realistic 3-D lattice gas model, Henon's 24-bit FCHC isometric model. Each of these VLSI processors is as powerful as a CRAY-2 for this application. ALGE is scalable in the sense that it achieves linear speedup for both fixed and increasing problem sizes with more processors. The core computation of a lattice gas model consists of many repetitions of two alternating phases: particle collision and propagation. Functional decomposition by symmetry group and virtual move are the respective keys to efficient implementation of collision and propagation.
Shared Memory Parallelization of an Implicit ADI-type CFD Code

NASA Technical Reports Server (NTRS)

Hauser, Th.; Huang, P. G.

1999-01-01

A parallelization study designed for ADI-type algorithms is presented using the OpenMP specification for shared-memory multiprocessor programming. Details of optimizations specifically addressed to cache-based computer architectures are described and performance measurements for the single and multiprocessor implementation are summarized. The paper demonstrates that optimization of memory access on a cache-based computer architecture controls the performance of the computational algorithm. A hybrid MPI/OpenMP approach is proposed for clusters of shared memory machines to further enhance the parallel performance. The method is applied to develop a new LES/DNS code, named LESTool. A preliminary DNS calculation of a fully developed channel flow at a Reynolds number of 180, Re(sub tau) = 180, has shown good agreement with existing data.
The performance of disk arrays in shared-memory database machines

NASA Technical Reports Server (NTRS)

Katz, Randy H.; Hong, Wei

1993-01-01

In this paper, we examine how disk arrays and shared memory multiprocessors lead to an effective method for constructing database machines for general-purpose complex query processing. We show that disk arrays can lead to cost-effective storage systems if they are configured from suitably small formfactor disk drives. We introduce the storage system metric data temperature as a way to evaluate how well a disk configuration can sustain its workload, and we show that disk arrays can sustain the same data temperature as a more expensive mirrored-disk configuration. We use the metric to evaluate the performance of disk arrays in XPRS, an operational shared-memory multiprocessor database system being developed at the University of California, Berkeley.
Dynamic programming on a shared-memory multiprocessor

NASA Technical Reports Server (NTRS)

Edmonds, Phil; Chu, Eleanor; George, Alan

1993-01-01

Three new algorithms for solving dynamic programming problems on a shared-memory parallel computer are described. All three algorithms attempt to balance work load, while keeping synchronization cost low. In particular, for a multiprocessor having p processors, an analysis of the best algorithm shows that the arithmetic cost is O(n-cubed/6p) and that the synchronization cost is O(absolute value of log sub C n) if p much less than n, where C = (2p-1)/(2p + 1) and n is the size of the problem. The low synchronization cost is important for machines where synchronization is expensive. Analysis and experiments show that the best algorithm is effective in balancing the work load and producing high efficiency.
Multiprocessor shared-memory information exchange

DOE Office of Scientific and Technical Information (OSTI.GOV)

Santoline, L.L.; Bowers, M.D.; Crew, A.W.

1989-02-01

In distributed microprocessor-based instrumentation and control systems, the inter-and intra-subsystem communication requirements ultimately form the basis for the overall system architecture. This paper describes a software protocol which addresses the intra-subsystem communications problem. Specifically the protocol allows for multiple processors to exchange information via a shared-memory interface. The authors primary goal is to provide a reliable means for information to be exchanged between central application processor boards (masters) and dedicated function processor boards (slaves) in a single computer chassis. The resultant Multiprocessor Shared-Memory Information Exchange (MSMIE) protocol, a standard master-slave shared-memory interface suitable for use in nuclear safety systems, ismore » designed to pass unidirectional buffers of information between the processors while providing a minimum, deterministic cycle time for this data exchange.« less
Multiprocessor Neural Network in Healthcare.

PubMed

Godó, Zoltán Attila; Kiss, Gábor; Kocsis, Dénes

2015-01-01

A possible way of creating a multiprocessor artificial neural network is by the use of microcontrollers. The RISC processors' high performance and the large number of I/O ports mean they are greatly suitable for creating such a system. During our research, we wanted to see if it is possible to efficiently create interaction between the artifical neural network and the natural nervous system. To achieve as much analogy to the living nervous system as possible, we created a frequency-modulated analog connection between the units. Our system is connected to the living nervous system through 128 microelectrodes. Two-way communication is provided through A/D transformation, which is even capable of testing psychopharmacons. The microcontroller-based analog artificial neural network can play a great role in medical singal processing, such as ECG, EEG etc.
Characterizing parallel file-access patterns on a large-scale multiprocessor

NASA Technical Reports Server (NTRS)

Purakayastha, A.; Ellis, Carla; Kotz, David; Nieuwejaar, Nils; Best, Michael L.

1995-01-01

High-performance parallel file systems are needed to satisfy tremendous I/O requirements of parallel scientific applications. The design of such high-performance parallel file systems depends on a comprehensive understanding of the expected workload, but so far there have been very few usage studies of multiprocessor file systems. This paper is part of the CHARISMA project, which intends to fill this void by measuring real file-system workloads on various production parallel machines. In particular, we present results from the CM-5 at the National Center for Supercomputing Applications. Our results are unique because we collect information about nearly every individual I/O request from the mix of jobs running on the machine. Analysis of the traces leads to various recommendations for parallel file-system design.
Dynamic modelling and estimation of the error due to asynchronism in a redundant asynchronous multiprocessor system

NASA Technical Reports Server (NTRS)

Huynh, Loc C.; Duval, R. W.

1986-01-01

The use of Redundant Asynchronous Multiprocessor System to achieve ultrareliable Fault Tolerant Control Systems shows great promise. The development has been hampered by the inability to determine whether differences in the outputs of redundant CPU's are due to failures or to accrued error built up by slight differences in CPU clock intervals. This study derives an analytical dynamic model of the difference between redundant CPU's due to differences in their clock intervals and uses this model with on-line parameter identification to idenitify the differences in the clock intervals. The ability of this methodology to accurately track errors due to asynchronisity generate an error signal with the effect of asynchronisity removed and this signal may be used to detect and isolate actual system failures.
Probabilistic evaluation of on-line checks in fault-tolerant multiprocessor systems

NASA Technical Reports Server (NTRS)

Nair, V. S. S.; Hoskote, Yatin V.; Abraham, Jacob A.

1992-01-01

The analysis of fault-tolerant multiprocessor systems that use concurrent error detection (CED) schemes is much more difficult than the analysis of conventional fault-tolerant architectures. Various analytical techniques have been proposed to evaluate CED schemes deterministically. However, these approaches are based on worst-case assumptions related to the failure of system components. Often, the evaluation results do not reflect the actual fault tolerance capabilities of the system. A probabilistic approach to evaluate the fault detecting and locating capabilities of on-line checks in a system is developed. The various probabilities associated with the checking schemes are identified and used in the framework of the matrix-based model. Based on these probabilistic matrices, estimates for the fault tolerance capabilities of various systems are derived analytically.
NASA Workshop on Computational Structural Mechanics 1987, part 1

NASA Technical Reports Server (NTRS)

Sykes, Nancy P. (Editor)

1989-01-01

Topics in Computational Structural Mechanics (CSM) are reviewed. CSM parallel structural methods, a transputer finite element solver, architectures for multiprocessor computers, and parallel eigenvalue extraction are among the topics discussed.
Job-mix modeling and system analysis of an aerospace multiprocessor.

NASA Technical Reports Server (NTRS)

Mallach, E. G.

1972-01-01

An aerospace guidance computer organization, consisting of multiple processors and memory units attached to a central time-multiplexed data bus, is described. A job mix for this type of computer is obtained by analysis of Apollo mission programs. Multiprocessor performance is then analyzed using: 1) queuing theory, under certain 'limiting case' assumptions; 2) Markov process methods; and 3) system simulation. Results of the analyses indicate: 1) Markov process analysis is a useful and efficient predictor of simulation results; 2) efficient job execution is not seriously impaired even when the system is so overloaded that new jobs are inordinately delayed in starting; 3) job scheduling is significant in determining system performance; and 4) a system having many slow processors may or may not perform better than a system of equal power having few fast processors, but will not perform significantly worse.

A macro-micro robot for precise force applications

NASA Technical Reports Server (NTRS)

Marzwell, Neville I.; Wang, Yulun

1993-01-01

This paper describes an 8 degree-of-freedom macro-micro robot capable of performing tasks which require accurate force control. Applications such as polishing, finishing, grinding, deburring, and cleaning are a few examples of tasks which need this capability. Currently these tasks are either performed manually or with dedicated machinery because of the lack of a flexible and cost effective tool, such as a programmable force-controlled robot. The basic design and control of the macro-micro robot is described in this paper. A modular high-performance multiprocessor control system was designed to provide sufficient compute power for executing advanced control methods. An 8 degree of freedom macro-micro mechanism was constructed to enable accurate tip forces. Control algorithms based on the impedance control method were derived, coded, and load balanced for maximum execution speed on the multiprocessor system.
Closed-form solutions of performability. [modeling of a degradable buffer/multiprocessor system

NASA Technical Reports Server (NTRS)

Meyer, J. F.

1981-01-01

Methods which yield closed form performability solutions for continuous valued variables are developed. The models are similar to those employed in performance modeling (i.e., Markovian queueing models) but are extended so as to account for variations in structure due to faults. In particular, the modeling of a degradable buffer/multiprocessor system is considered whose performance Y is the (normalized) average throughput rate realized during a bounded interval of time. To avoid known difficulties associated with exact transient solutions, an approximate decomposition of the model is employed permitting certain submodels to be solved in equilibrium. These solutions are then incorporated in a model with fewer transient states and by solving the latter, a closed form solution of the system's performability is obtained. In conclusion, some applications of this solution are discussed and illustrated, including an example of design optimization.
Parallel algorithm of VLBI software correlator under multiprocessor environment

NASA Astrophysics Data System (ADS)

Zheng, Weimin; Zhang, Dong

2007-11-01

The correlator is the key signal processing equipment of a Very Lone Baseline Interferometry (VLBI) synthetic aperture telescope. It receives the mass data collected by the VLBI observatories and produces the visibility function of the target, which can be used to spacecraft position, baseline length measurement, synthesis imaging, and other scientific applications. VLBI data correlation is a task of data intensive and computation intensive. This paper presents the algorithms of two parallel software correlators under multiprocessor environments. A near real-time correlator for spacecraft tracking adopts the pipelining and thread-parallel technology, and runs on the SMP (Symmetric Multiple Processor) servers. Another high speed prototype correlator using the mixed Pthreads and MPI (Massage Passing Interface) parallel algorithm is realized on a small Beowulf cluster platform. Both correlators have the characteristic of flexible structure, scalability, and with 10-station data correlating abilities.
HyperForest: A high performance multi-processor architecture for real-time intelligent systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Garcia, P. Jr.; Rebeil, J.P.; Pollard, H.

1997-04-01

Intelligent Systems are characterized by the intensive use of computer power. The computer revolution of the last few years is what has made possible the development of the first generation of Intelligent Systems. Software for second generation Intelligent Systems will be more complex and will require more powerful computing engines in order to meet real-time constraints imposed by new robots, sensors, and applications. A multiprocessor architecture was developed that merges the advantages of message-passing and shared-memory structures: expendability and real-time compliance. The HyperForest architecture will provide an expandable real-time computing platform for computationally intensive Intelligent Systems and open the doorsmore » for the application of these systems to more complex tasks in environmental restoration and cleanup projects, flexible manufacturing systems, and DOE`s own production and disassembly activities.« less
Using parallel banded linear system solvers in generalized eigenvalue problems

NASA Technical Reports Server (NTRS)

Zhang, Hong; Moss, William F.

1993-01-01

Subspace iteration is a reliable and cost effective method for solving positive definite banded symmetric generalized eigenproblems, especially in the case of large scale problems. This paper discusses an algorithm that makes use of two parallel banded solvers in subspace iteration. A shift is introduced to decompose the banded linear systems into relatively independent subsystems and to accelerate the iterations. With this shift, an eigenproblem is mapped efficiently into the memories of a multiprocessor and a high speed-up is obtained for parallel implementations. An optimal shift is a shift that balances total computation and communication costs. Under certain conditions, we show how to estimate an optimal shift analytically using the decay rate for the inverse of a banded matrix, and how to improve this estimate. Computational results on iPSC/2 and iPSC/860 multiprocessors are presented.
Development and evaluation of a Fault-Tolerant Multiprocessor (FTMP) computer. Volume 2: FTMP software

NASA Technical Reports Server (NTRS)

Lala, J. H.; Smith, T. B., III

1983-01-01

The software developed for the Fault-Tolerant Multiprocessor (FTMP) is described. The FTMP executive is a timer-interrupt driven dispatcher that schedules iterative tasks which run at 3.125, 12.5, and 25 Hz. Major tasks which run under the executive include system configuration control, flight control, and display. The flight control task includes autopilot and autoland functions for a jet transport aircraft. System Displays include status displays of all hardware elements (processors, memories, I/O ports, buses), failure log displays showing transient and hard faults, and an autopilot display. All software is in a higher order language (AED, an ALGOL derivative). The executive is a fully distributed general purpose executive which automatically balances the load among available processor triads. Provisions for graceful performance degradation under processing overload are an integral part of the scheduling algorithms.
Performance and evaluation of real-time multicomputer control systems

NASA Technical Reports Server (NTRS)

Shin, K. G.

1985-01-01

Three experiments on fault tolerant multiprocessors (FTMP) were begun. They are: (1) measurement of fault latency in FTMP; (2) validation and analysis of FTMP synchronization protocols; and investigation of error propagation in FTMP.
Stochastic airspace simulation tool development

DOT National Transportation Integrated Search

2009-10-01

Modeling and simulation is often used to study : the physical world when observation may not be : practical. The overall goal of a recent and ongoing : simulation tool project has been to provide a : documented, lifecycle-managed, multi-processor : c...
ARTS III/Parallel Processor Design Study

DOT National Transportation Integrated Search

1975-04-01

It was the purpose of this design study to investigate the feasibility, suitability, and cost-effectiveness of augmenting the ARTS III failsafe/failsoft multiprocessor system with a form of parallel processor to accomodate a large growth in air traff...
Evict on write, a management strategy for a prefetch unit and/or first level cache in a multiprocessor system with speculative execution

DOEpatents

Gara, Alan; Ohmacht, Martin

2014-09-16

In a multiprocessor system with at least two levels of cache, a speculative thread may run on a core processor in parallel with other threads. When the thread seeks to do a write to main memory, this access is to be written through the first level cache to the second level cache. After the write though, the corresponding line is deleted from the first level cache and/or prefetch unit, so that any further accesses to the same location in main memory have to be retrieved from the second level cache. The second level cache keeps track of multiple versions of data, where more than one speculative thread is running in parallel, while the first level cache does not have any of the versions during speculation. A switch allows choosing between modes of operation of a speculation blind first level cache.
Fault recovery characteristics of the fault tolerant multi-processor

NASA Technical Reports Server (NTRS)

Padilla, Peter A.

1990-01-01

The fault handling performance of the fault tolerant multiprocessor (FTMP) was investigated. Fault handling errors detected during fault injection experiments were characterized. In these fault injection experiments, the FTMP disabled a working unit instead of the faulted unit once every 500 faults, on the average. System design weaknesses allow active faults to exercise a part of the fault management software that handles byzantine or lying faults. It is pointed out that these weak areas in the FTMP's design increase the probability that, for any hardware fault, a good LRU (line replaceable unit) is mistakenly disabled by the fault management software. It is concluded that fault injection can help detect and analyze the behavior of a system in the ultra-reliable regime. Although fault injection testing cannot be exhaustive, it has been demonstrated that it provides a unique capability to unmask problems and to characterize the behavior of a fault-tolerant system.
Testing and operating a multiprocessor chip with processor redundancy

DOEpatents

Bellofatto, Ralph E; Douskey, Steven M; Haring, Rudolf A; McManus, Moyra K; Ohmacht, Martin; Schmunkamp, Dietmar; Sugavanam, Krishnan; Weatherford, Bryan J

2014-10-21

A system and method for improving the yield rate of a multiprocessor semiconductor chip that includes primary processor cores and one or more redundant processor cores. A first tester conducts a first test on one or more processor cores, and encodes results of the first test in an on-chip non-volatile memory. A second tester conducts a second test on the processor cores, and encodes results of the second test in an external non-volatile storage device. An override bit of a multiplexer is set if a processor core fails the second test. In response to the override bit, the multiplexer selects a physical-to-logical mapping of processor IDs according to one of: the encoded results in the memory device or the encoded results in the external storage device. On-chip logic configures the processor cores according to the selected physical-to-logical mapping.
Strategies for concurrent processing of complex algorithms in data driven architectures

NASA Technical Reports Server (NTRS)

Stoughton, John W.; Mielke, Roland R.

1988-01-01

The purpose is to document research to develop strategies for concurrent processing of complex algorithms in data driven architectures. The problem domain consists of decision-free algorithms having large-grained, computationally complex primitive operations. Such are often found in signal processing and control applications. The anticipated multiprocessor environment is a data flow architecture containing between two and twenty computing elements. Each computing element is a processor having local program memory, and which communicates with a common global data memory. A new graph theoretic model called ATAMM which establishes rules for relating a decomposed algorithm to its execution in a data flow architecture is presented. The ATAMM model is used to determine strategies to achieve optimum time performance and to develop a system diagnostic software tool. In addition, preliminary work on a new multiprocessor operating system based on the ATAMM specifications is described.
A robot arm simulation with a shared memory multiprocessor machine

NASA Technical Reports Server (NTRS)

Kim, Sung-Soo; Chuang, Li-Ping

1989-01-01

A parallel processing scheme for a single chain robot arm is presented for high speed computation on a shared memory multiprocessor. A recursive formulation that is derived from a virtual work form of the d'Alembert equations of motion is utilized for robot arm dynamics. A joint drive system that consists of a motor rotor and gears is included in the arm dynamics model, in order to take into account gyroscopic effects due to the spinning of the rotor. The fine grain parallelism of mechanical and control subsystem models is exploited, based on independent computation associated with bodies, joint drive systems, and controllers. Efficiency and effectiveness of the parallel scheme are demonstrated through simulations of a telerobotic manipulator arm. Two different mechanical subsystem models, i.e., with and without gyroscopic effects, are compared, to show the trade-off between efficiency and accuracy.
Strategies for concurrent processing of complex algorithms in data driven architectures

NASA Technical Reports Server (NTRS)

Stoughton, John W.; Mielke, Roland R.

1987-01-01

The results of ongoing research directed at developing a graph theoretical model for describing data and control flow associated with the execution of large grained algorithms in a spatial distributed computer environment is presented. This model is identified by the acronym ATAMM (Algorithm/Architecture Mapping Model). The purpose of such a model is to provide a basis for establishing rules for relating an algorithm to its execution in a multiprocessor environment. Specifications derived from the model lead directly to the description of a data flow architecture which is a consequence of the inherent behavior of the data and control flow described by the model. The purpose of the ATAMM based architecture is to optimize computational concurrency in the multiprocessor environment and to provide an analytical basis for performance evaluation. The ATAMM model and architecture specifications are demonstrated on a prototype system for concept validation.
Partitioning strategy for efficient nonlinear finite element dynamic analysis on multiprocessor computers

NASA Technical Reports Server (NTRS)

Noor, Ahmed K.; Peters, Jeanne M.

1989-01-01

A computational procedure is presented for the nonlinear dynamic analysis of unsymmetric structures on vector multiprocessor systems. The procedure is based on a novel hierarchical partitioning strategy in which the response of the unsymmetric and antisymmetric response vectors (modes), each obtained by using only a fraction of the degrees of freedom of the original finite element model. The three key elements of the procedure which result in high degree of concurrency throughout the solution process are: (1) mixed (or primitive variable) formulation with independent shape functions for the different fields; (2) operator splitting or restructuring of the discrete equations at each time step to delineate the symmetric and antisymmetric vectors constituting the response; and (3) two level iterative process for generating the response of the structure. An assessment is made of the effectiveness of the procedure on the CRAY X-MP/4 computers.
System architecture for asynchronous multi-processor robotic control system

NASA Technical Reports Server (NTRS)

Steele, Robert D.; Long, Mark; Backes, Paul

1993-01-01

The architecture for the Modular Telerobot Task Execution System (MOTES) as implemented in the Supervisory Telerobotics (STELER) Laboratory is described. MOTES is the software component of the remote site of a local-remote telerobotic system which is being developed for NASA for space applications, in particular Space Station Freedom applications. The system is being developed to provide control and supervised autonomous control to support both space based operation and ground-remote control with time delay. The local-remote architecture places task planning responsibilities at the local site and task execution responsibilities at the remote site. This separation allows the remote site to be designed to optimize task execution capability within a limited computational environment such as is expected in flight systems. The local site task planning system could be placed on the ground where few computational limitations are expected. MOTES is written in the Ada programming language for a multiprocessor environment.
An Evaluation of Architectural Platforms for Parallel Navier-Stokes Computations

NASA Technical Reports Server (NTRS)

Jayasimha, D. N.; Hayder, M. E.; Pillay, S. K.

1996-01-01

We study the computational, communication, and scalability characteristics of a computational fluid dynamics application, which solves the time accurate flow field of a jet using the compressible Navier-Stokes equations, on a variety of parallel architecture platforms. The platforms chosen for this study are a cluster of workstations (the LACE experimental testbed at NASA Lewis), a shared memory multiprocessor (the Cray YMP), and distributed memory multiprocessors with different topologies - the IBM SP and the Cray T3D. We investigate the impact of various networks connecting the cluster of workstations on the performance of the application and the overheads induced by popular message passing libraries used for parallelization. The work also highlights the importance of matching the memory bandwidth to the processor speed for good single processor performance. By studying the performance of an application on a variety of architectures, we are able to point out the strengths and weaknesses of each of the example computing platforms.
Parallelizing Navier-Stokes Computations on a Variety of Architectural Platforms

NASA Technical Reports Server (NTRS)

Jayasimha, D. N.; Hayder, M. E.; Pillay, S. K.

1997-01-01

We study the computational, communication, and scalability characteristics of a Computational Fluid Dynamics application, which solves the time accurate flow field of a jet using the compressible Navier-Stokes equations, on a variety of parallel architectural platforms. The platforms chosen for this study are a cluster of workstations (the LACE experimental testbed at NASA Lewis), a shared memory multiprocessor (the Cray YMP), distributed memory multiprocessors with different topologies-the IBM SP and the Cray T3D. We investigate the impact of various networks, connecting the cluster of workstations, on the performance of the application and the overheads induced by popular message passing libraries used for parallelization. The work also highlights the importance of matching the memory bandwidth to the processor speed for good single processor performance. By studying the performance of an application on a variety of architectures, we are able to point out the strengths and weaknesses of each of the example computing platforms.
Characterizing parallel file-access patterns on a large-scale multiprocessor

NASA Technical Reports Server (NTRS)

Purakayastha, Apratim; Ellis, Carla Schlatter; Kotz, David; Nieuwejaar, Nils; Best, Michael

1994-01-01

Rapid increases in the computational speeds of multiprocessors have not been matched by corresponding performance enhancements in the I/O subsystem. To satisfy the large and growing I/O requirements of some parallel scientific applications, we need parallel file systems that can provide high-bandwidth and high-volume data transfer between the I/O subsystem and thousands of processors. Design of such high-performance parallel file systems depends on a thorough grasp of the expected workload. So far there have been no comprehensive usage studies of multiprocessor file systems. Our CHARISMA project intends to fill this void. The first results from our study involve an iPSC/860 at NASA Ames. This paper presents results from a different platform, the CM-5 at the National Center for Supercomputing Applications. The CHARISMA studies are unique because we collect information about every individual read and write request and about the entire mix of applications running on the machines. The results of our trace analysis lead to recommendations for parallel file system design. First the file system should support efficient concurrent access to many files, and I/O requests from many jobs under varying load conditions. Second, it must efficiently manage large files kept open for long periods. Third, it should expect to see small requests predominantly sequential access patterns, application-wide synchronous access, no concurrent file-sharing between jobs appreciable byte and block sharing between processes within jobs, and strong interprocess locality. Finally, the trace data suggest that node-level write caches and collective I/O request interfaces may be useful in certain environments.

The Automated Instrumentation and Monitoring System (AIMS) reference manual

NASA Technical Reports Server (NTRS)

Yan, Jerry; Hontalas, Philip; Listgarten, Sherry

1993-01-01

Whether a researcher is designing the 'next parallel programming paradigm,' another 'scalable multiprocessor' or investigating resource allocation algorithms for multiprocessors, a facility that enables parallel program execution to be captured and displayed is invaluable. Careful analysis of execution traces can help computer designers and software architects to uncover system behavior and to take advantage of specific application characteristics and hardware features. A software tool kit that facilitates performance evaluation of parallel applications on multiprocessors is described. The Automated Instrumentation and Monitoring System (AIMS) has four major software components: a source code instrumentor which automatically inserts active event recorders into the program's source code before compilation; a run time performance-monitoring library, which collects performance data; a trace file animation and analysis tool kit which reconstructs program execution from the trace file; and a trace post-processor which compensate for data collection overhead. Besides being used as prototype for developing new techniques for instrumenting, monitoring, and visualizing parallel program execution, AIMS is also being incorporated into the run-time environments of various hardware test beds to evaluate their impact on user productivity. Currently, AIMS instrumentors accept FORTRAN and C parallel programs written for Intel's NX operating system on the iPSC family of multi computers. A run-time performance-monitoring library for the iPSC/860 is included in this release. We plan to release monitors for other platforms (such as PVM and TMC's CM-5) in the near future. Performance data collected can be graphically displayed on workstations (e.g. Sun Sparc and SGI) supporting X-Windows (in particular, Xl IR5, Motif 1.1.3).
The Automated Instrumentation and Monitoring System (AIMS): Design and Architecture. 3.2

NASA Technical Reports Server (NTRS)

Yan, Jerry C.; Schmidt, Melisa; Schulbach, Cathy; Bailey, David (Technical Monitor)

1997-01-01

Whether a researcher is designing the 'next parallel programming paradigm', another 'scalable multiprocessor' or investigating resource allocation algorithms for multiprocessors, a facility that enables parallel program execution to be captured and displayed is invaluable. Careful analysis of such information can help computer and software architects to capture, and therefore, exploit behavioral variations among/within various parallel programs to take advantage of specific hardware characteristics. A software tool-set that facilitates performance evaluation of parallel applications on multiprocessors has been put together at NASA Ames Research Center under the sponsorship of NASA's High Performance Computing and Communications Program over the past five years. The Automated Instrumentation and Monitoring Systematic has three major software components: a source code instrumentor which automatically inserts active event recorders into program source code before compilation; a run-time performance monitoring library which collects performance data; and a visualization tool-set which reconstructs program execution based on the data collected. Besides being used as a prototype for developing new techniques for instrumenting, monitoring and presenting parallel program execution, AIMS is also being incorporated into the run-time environments of various hardware testbeds to evaluate their impact on user productivity. Currently, the execution of FORTRAN and C programs on the Intel Paragon and PALM workstations can be automatically instrumented and monitored. Performance data thus collected can be displayed graphically on various workstations. The process of performance tuning with AIMS will be illustrated using various NAB Parallel Benchmarks. This report includes a description of the internal architecture of AIMS and a listing of the source code.
Synchronizing Data-Bus Messages

NASA Technical Reports Server (NTRS)

Harris, L. H.

1985-01-01

Adapter allows communications among as many as 30 data processors without central bus controller. Adapter improves reliability of multiprocessor system by eliminating point of failure that causes entire system to fail. Scheme prevents data collisions and eliminates nonessential polling, thereby reducing power consumption.
Scheduler-Conscious Synchronization.

DTIC Science & Technology

1994-12-01

SPONSORING I MONITORING Office of Naval Research ARPA AGENCY REPORT NUMBER Information Systems 3701 N. Fairfax Drive TR 550 Arlington VA 22217 Arlington VA...Broughton. A New Approach to Exclusive Data Access in Shared Memory Multiprocessors. Technical Report UCRL -97663, Lawrence Livermore National Laboratory
Highway Traffic Simulations on Multi-Processor Computers

DOT National Transportation Integrated Search

1997-01-01

A computer model has been developed to simulate highway traffic for various degrees of automation with a high degree of fidelity in regard to driver control and vehicle characteristics. The model simulates vehicle maneuvering in a multi-lane highway ...
Evaluation of the Cedar memory system: Configuration of 16 by 16

NASA Technical Reports Server (NTRS)

Gallivan, K.; Jalby, W.; Wijshoff, H.

1991-01-01

Some basic results on the performance of the Cedar multiprocessor system are presented. Empirical results on the 16 processor 16 memory bank system configuration, which show the behavior of the Cedar system under different modes of operation are presented.
The " Swarm of Ants vs. Herd of Elephants" Debated Revisited: Performance Measurements of PVM-Overflow Across a Wide Spectrum of Architectures

NASA Technical Reports Server (NTRS)

Yan, Jerry C.; Jespersen, Dennis; Buning, Peter; Bailey, David (Technical Monitor)

1996-01-01

The Gorden Bell Prizes given out at Supercomputing every year includes at least two catergories: performance (highest GFLOP count) and price-performance (GFLOP/million $$) for real applications. In the past five years, the winners of the price-performance categories all came from networks of work-stations. This reflects three important facts: 1. supercomputers are still too expensive for the masses; 2. achieving high performance for real applications takes real work; and, most importantly; 3. it is possible to obtain acceptable performance for certain real applications on network of work stations. With the continued advance of network technology as well as increased performance of "desktop" workstation, the "Swarm of Ants vs. Herd of Elephants" debate, which began with vector multiprocessors (VPPs) against SIMD type multiprocessors (e.g. CM2), is now recast as VPPs against Symetric Multiprocessors (SMPs, e.g. SGI PowerChallenge). This paper reports on performance studies we performed solving a large scale (2-million grid pt.s) CFD problem involving a Boeing 747 based on a parallel version of OVERFLOW that utilizes message passing on PVM. A performance monitoring tool developed under NASA HPCC, called AIMS, was used to instrument and analyze the the performance data thus obtained. We plan to compare its performance data obtained across a wide spectrum of architectures including: the Cray C90, IBM/SP2, SGI/Power Challenge Cluster, to a group of workstations connected over a simple network. The metrics of comparison includes speed-up, price-performance, throughput, and turn-around time. We also plan to present a plan of attack for various issues that will make the execution of Grand Challenge Applications across the Global Information Infrastructure a reality.
Parallel implementation and evaluation of motion estimation system algorithms on a distributed memory multiprocessor using knowledge based mappings

NASA Technical Reports Server (NTRS)

Choudhary, Alok Nidhi; Leung, Mun K.; Huang, Thomas S.; Patel, Janak H.

1989-01-01

Several techniques to perform static and dynamic load balancing techniques for vision systems are presented. These techniques are novel in the sense that they capture the computational requirements of a task by examining the data when it is produced. Furthermore, they can be applied to many vision systems because many algorithms in different systems are either the same, or have similar computational characteristics. These techniques are evaluated by applying them on a parallel implementation of the algorithms in a motion estimation system on a hypercube multiprocessor system. The motion estimation system consists of the following steps: (1) extraction of features; (2) stereo match of images in one time instant; (3) time match of images from different time instants; (4) stereo match to compute final unambiguous points; and (5) computation of motion parameters. It is shown that the performance gains when these data decomposition and load balancing techniques are used are significant and the overhead of using these techniques is minimal.
Performance and economy of a fault-tolerant multiprocessor

NASA Technical Reports Server (NTRS)

Lala, J. H.; Smith, C. J.

1979-01-01

The FTMP (Fault-Tolerant Multiprocessor) is one of two central aircraft fault-tolerant architectures now in the prototype phase under NASA sponsorship. The intended application of the computer includes such critical real-time tasks as 'fly-by-wire' active control and completely automatic Category III landings of commercial aircraft. The FTMP architecture is briefly described and it is shown that it is a viable solution to the multi-faceted problems of safety, speed, and cost. Three job dispatch strategies are described, and their results with respect to job-starting delay are presented. The first strategy is a simple First-Come-First-Serve (FCFS) job dispatch executive. The other two schedulers are an adaptive FCFS and an interrupt driven scheduler. Three failure modes are discussed, and the FTMP survival probability in the face of random hard failures is evaluated. It is noted that the hourly cost of operating two FTMPs in a transport aircraft can be as little as one-to-two percent of the total flight-hour cost of the aircraft.
A design fix to supervisory control for fault-tolerant scheduling of real-time multiprocessor systems with aperiodic tasks

NASA Astrophysics Data System (ADS)

Devaraj, Rajesh; Sarkar, Arnab; Biswas, Santosh

2015-11-01

In the article 'Supervisory control for fault-tolerant scheduling of real-time multiprocessor systems with aperiodic tasks', Park and Cho presented a systematic way of computing a largest fault-tolerant and schedulable language that provides information on whether the scheduler (i.e., supervisor) should accept or reject a newly arrived aperiodic task. The computation of such a language is mainly dependent on the task execution model presented in their paper. However, the task execution model is unable to capture the situation when the fault of a processor occurs even before the task has arrived. Consequently, a task execution model that does not capture this fact may possibly be assigned for execution on a faulty processor. This problem has been illustrated with an appropriate example. Then, the task execution model of Park and Cho has been modified to strengthen the requirement that none of the tasks are assigned for execution on a faulty processor.
Second International Workshop on Software Engineering and Code Design in Parallel Meteorological and Oceanographic Applications

NASA Technical Reports Server (NTRS)

OKeefe, Matthew (Editor); Kerr, Christopher L. (Editor)

1998-01-01

This report contains the abstracts and technical papers from the Second International Workshop on Software Engineering and Code Design in Parallel Meteorological and Oceanographic Applications, held June 15-18, 1998, in Scottsdale, Arizona. The purpose of the workshop is to bring together software developers in meteorology and oceanography to discuss software engineering and code design issues for parallel architectures, including Massively Parallel Processors (MPP's), Parallel Vector Processors (PVP's), Symmetric Multi-Processors (SMP's), Distributed Shared Memory (DSM) multi-processors, and clusters. Issues to be discussed include: (1) code architectures for current parallel models, including basic data structures, storage allocation, variable naming conventions, coding rules and styles, i/o and pre/post-processing of data; (2) designing modular code; (3) load balancing and domain decomposition; (4) techniques that exploit parallelism efficiently yet hide the machine-related details from the programmer; (5) tools for making the programmer more productive; and (6) the proliferation of programming models (F--, OpenMP, MPI, and HPF).
Sequoia: A fault-tolerant tightly coupled multiprocessor for transaction processing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bernstein, P.A.

1988-02-01

The Sequoia computer is a tightly coupled multiprocessor, and thus attains the performance advantages of this style of architecture. It avoids most of the fault-tolerance disadvantages of tight coupling by using a new fault-tolerance design. The Sequoia architecture is similar to other multimicroprocessor architectures, such as those of Encore and Sequent, in that it gives dozens of microprocessors shared access to a large main memory. It resembles the Stratus architecture in its extensive use of hardware fault-detection techniques. It resembles Stratus and Auragen in its ability to quickly recover all processes after a single point failure, transparently to the user.more » However, Sequoia is unique in its combination of a large-scale tightly coupled architecture with a hardware approach to fault tolerance. This article gives an overview of how the hardware architecture and operating systems (OS) work together to provide a high degree of fault tolerance with good system performance.« less
Parallelization of NAS Benchmarks for Shared Memory Multiprocessors

NASA Technical Reports Server (NTRS)

Waheed, Abdul; Yan, Jerry C.; Saini, Subhash (Technical Monitor)

1998-01-01

This paper presents our experiences of parallelizing the sequential implementation of NAS benchmarks using compiler directives on SGI Origin2000 distributed shared memory (DSM) system. Porting existing applications to new high performance parallel and distributed computing platforms is a challenging task. Ideally, a user develops a sequential version of the application, leaving the task of porting to new generations of high performance computing systems to parallelization tools and compilers. Due to the simplicity of programming shared-memory multiprocessors, compiler developers have provided various facilities to allow the users to exploit parallelism. Native compilers on SGI Origin2000 support multiprocessing directives to allow users to exploit loop-level parallelism in their programs. Additionally, supporting tools can accomplish this process automatically and present the results of parallelization to the users. We experimented with these compiler directives and supporting tools by parallelizing sequential implementation of NAS benchmarks. Results reported in this paper indicate that with minimal effort, the performance gain is comparable with the hand-parallelized, carefully optimized, message-passing implementations of the same benchmarks.
Multitasking runtime systems for the Cedar Multiprocessor

DOE Office of Scientific and Technical Information (OSTI.GOV)

Guzzi, M.D.

1986-07-01

The programming of a MIMD machine is more complex than for SISD and SIMD machines. The multiple computational resources of the machine must be made available to the programming language compiler and to the programmer so that multitasking programs may be written. This thesis will explore the additional complexity of programming a MIMD machine, the Cedar Multiprocessor specifically, and the multitasking runtime system necessary to provide multitasking resources to the user. First, the problem will be well defined: the Cedar machine, its operating system, the programming language, and multitasking concepts will be described. Second, a solution to the problem, calledmore » macrotasking, will be proposed. This solution provides multitasking facilities to the programmer at a very coarse level with many visible machine dependencies. Third, an alternate solution, called microtasking, will be proposed. This solution provides multitasking facilities of a much finer grain. This solution does not depend so rigidly on the specific architecture of the machine. Finally, the two solutions will be compared for effectiveness. 12 refs., 16 figs.« less
Operating experience with a VMEbus multiprocessor system for data acquisition and reduction in nuclear physics

NASA Astrophysics Data System (ADS)

Kutt, P. H.; Balamuth, D. P.

1989-10-01

Summary form only given, as follows. A multiprocessor system based on commercially available VMEbus components has been developed for the acquisition and reduction of event-mode data in nuclear physics experiments. The system contains seven 68000 CPUs and 14 Mbyte of memory. A minimal operating system handles data transfer and task allocation, and a compiler for a specially designed event analysis language produces code for the processors. The system has been in operation for four years at the University of Pennsylvania Tandem Accelerator Laboratory. Computation rates over three times that of a MicroVAX II have been achieved at a fraction of the cost. The use of WORM optical disks for event recording allows the processing of gigabyte data sets without operator intervention. A more powerful system is being planned which will make use of recently developed RISC (reduced instruction set computer) processors to obtain an order of magnitude increase in computing power per node.
USC orthogonal multiprocessor for image processing with neural networks

NASA Astrophysics Data System (ADS)

Hwang, Kai; Panda, Dhabaleswar K.; Haddadi, Navid

1990-07-01

This paper presents the architectural features and imaging applications of the Orthogonal MultiProcessor (OMP) system, which is under construction at the University of Southern California with research funding from NSF and assistance from several industrial partners. The prototype OMP is being built with 16 Intel i860 RISC microprocessors and 256 parallel memory modules using custom-designed spanning buses, which are 2-D interleaved and orthogonally accessed without conflicts. The 16-processor OMP prototype is targeted to achieve 430 MIPS and 600 Mflops, which have been verified by simulation experiments based on the design parameters used. The prototype OMP machine will be initially applied for image processing, computer vision, and neural network simulation applications. We summarize important vision and imaging algorithms that can be restructured with neural network models. These algorithms can efficiently run on the OMP hardware with linear speedup. The ultimate goal is to develop a high-performance Visual Computer (Viscom) for integrated low- and high-level image processing and vision tasks.
Alpha: A real-time decentralized operating system for mission-oriented system integration and operation

NASA Technical Reports Server (NTRS)

Jensen, E. Douglas

1988-01-01

Alpha is a new kind of operating system that is unique in two highly significant ways. First, it is decentralized transparently providing reliable resource management across physically dispersed nodes, so that distributed applications programming can be done largely as though it were centralized. And second, it provides comprehensive, high technology support for real-time system integration and operation, an application area which consists predominately of aperiodic activities having critical time constraints such as deadlines. Alpha is extremely adaptable so that it can be easily optimized for a wide range of problem-specific functionality, performance, and cost. Alpha is the first systems effort of the Archons Project, and the prototype was created at Carnegie-Mellon University directly on modified Sun multiprocessor workstation hardware. It has been demonstrated with a real-time C(sup 2) application. Continuing research is leading to a series of enhanced follow-ons to Alpha; these are portable but initially hosted on Concurrent's MASSCOMP line of multiprocessor products.
A pipelined architecture for real time correction of non-uniformity in infrared focal plane arrays imaging system using multiprocessors

NASA Astrophysics Data System (ADS)

Zou, Liang; Fu, Zhuang; Zhao, YanZheng; Yang, JunYan

2010-07-01

This paper proposes a kind of pipelined electric circuit architecture implemented in FPGA, a very large scale integrated circuit (VLSI), which efficiently deals with the real time non-uniformity correction (NUC) algorithm for infrared focal plane arrays (IRFPA). Dual Nios II soft-core processors and a DSP with a 64+ core together constitute this image system. Each processor undertakes own systematic task, coordinating its work with each other's. The system on programmable chip (SOPC) in FPGA works steadily under the global clock frequency of 96Mhz. Adequate time allowance makes FPGA perform NUC image pre-processing algorithm with ease, which has offered favorable guarantee for the work of post image processing in DSP. And at the meantime, this paper presents a hardware (HW) and software (SW) co-design in FPGA. Thus, this systematic architecture yields an image processing system with multiprocessor, and a smart solution to the satisfaction with the performance of the system.
Fault-free behavior of reliable multiprocessor systems: FTMP experiments in AIRLAB

NASA Technical Reports Server (NTRS)

Clune, E.; Segall, Z.; Siewiorek, D.

1985-01-01

This report describes a set of experiments which were implemented on the Fault tolerant Multi-Processor (FTMP) at NASA/Langley's AIRLAB facility. These experiments are part of an effort to formulate and evaluate validation methodologies for fault-tolerant computers. This report deals with the measurement of single parameters (baselines) of a fault free system. The initial set of baseline experiments lead to the following conclusions: (1) The system clock is constant and independent of workload in the tested cases; (2) the instruction execution times are constant; (3) the R4 frame size is 40mS with some variation; (4) the frame stretching mechanism has some flaws in its implementation that allow the possibility of an infinite stretching of frame duration. Future experiments are planned. Some will broaden the results of these initial experiments. Others will measure the system more dynamically. The implementation of a synthetic workload generation mechanism for FTMP is planned to enhance the experimental environment of the system.
OPAD-EDIFIS Real-Time Processing

NASA Technical Reports Server (NTRS)

Katsinis, Constantine

1997-01-01

The Optical Plume Anomaly Detection (OPAD) detects engine hardware degradation of flight vehicles through identification and quantification of elemental species found in the plume by analyzing the plume emission spectra in a real-time mode. Real-time performance of OPAD relies on extensive software which must report metal amounts in the plume faster than once every 0.5 sec. OPAD software previously written by NASA scientists performed most necessary functions at speeds which were far below what is needed for real-time operation. The research presented in this report improved the execution speed of the software by optimizing the code without changing the algorithms and converting it into a parallelized form which is executed in a shared-memory multiprocessor system. The resulting code was subjected to extensive timing analysis. The report also provides suggestions for further performance improvement by (1) identifying areas of algorithm optimization, (2) recommending commercially available multiprocessor architectures and operating systems to support real-time execution and (3) presenting an initial study of fault-tolerance requirements.

Parallel Navier-Stokes computations on shared and distributed memory architectures

NASA Technical Reports Server (NTRS)

Hayder, M. Ehtesham; Jayasimha, D. N.; Pillay, Sasi Kumar

1995-01-01

We study a high order finite difference scheme to solve the time accurate flow field of a jet using the compressible Navier-Stokes equations. As part of our ongoing efforts, we have implemented our numerical model on three parallel computing platforms to study the computational, communication, and scalability characteristics. The platforms chosen for this study are a cluster of workstations connected through fast networks (the LACE experimental testbed at NASA Lewis), a shared memory multiprocessor (the Cray YMP), and a distributed memory multiprocessor (the IBM SPI). Our focus in this study is on the LACE testbed. We present some results for the Cray YMP and the IBM SP1 mainly for comparison purposes. On the LACE testbed, we study: (1) the communication characteristics of Ethernet, FDDI, and the ALLNODE networks and (2) the overheads induced by the PVM message passing library used for parallelizing the application. We demonstrate that clustering of workstations is effective and has the potential to be computationally competitive with supercomputers at a fraction of the cost.
Solving Partial Differential Equations in a data-driven multiprocessor environment

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gaudiot, J.L.; Lin, C.M.; Hosseiniyar, M.

1988-12-31

Partial differential equations can be found in a host of engineering and scientific problems. The emergence of new parallel architectures has spurred research in the definition of parallel PDE solvers. Concurrently, highly programmable systems such as data-how architectures have been proposed for the exploitation of large scale parallelism. The implementation of some Partial Differential Equation solvers (such as the Jacobi method) on a tagged token data-flow graph is demonstrated here. Asynchronous methods (chaotic relaxation) are studied and new scheduling approaches (the Token No-Labeling scheme) are introduced in order to support the implementation of the asychronous methods in a data-driven environment.more » New high-level data-flow language program constructs are introduced in order to handle chaotic operations. Finally, the performance of the program graphs is demonstrated by a deterministic simulation of a message passing data-flow multiprocessor. An analysis of the overhead in the data-flow graphs is undertaken to demonstrate the limits of parallel operations in dataflow PDE program graphs.« less
Proceedings of the second SISAL users` conference

DOE Office of Scientific and Technical Information (OSTI.GOV)

Feo, J T; Frerking, C; Miller, P J

1992-12-01

This report contains papers on the following topics: A sisal code for computing the fourier transform on S{sub N}; five ways to fill your knapsack; simulating material dislocation motion in sisal; candis as an interface for sisal; parallelisation and performance of the burg algorithm on a shared-memory multiprocessor; use of genetic algorithm in sisal to solve the file design problem; implementing FFT`s in sisal; programming and evaluating the performance of signal processing applications in the sisal programming environment; sisal and Von Neumann-based languages: translation and intercommunication; an IF2 code generator for ADAM architecture; program partitioning for NUMA multiprocessor computer systems;more » mapping functional parallelism on distributed memory machines; implicit array copying: prevention is better than cure ; mathematical syntax for sisal; an approach for optimizing recursive functions; implementing arrays in sisal 2.0; Fol: an object oriented extension to the sisal language; twine: a portable, extensible sisal execution kernel; and investigating the memory performance of the optimizing sisal compiler.« less
Compilation time analysis to minimize run-time overhead in preemptive scheduling on multiprocessors

NASA Astrophysics Data System (ADS)

Wauters, Piet; Lauwereins, Rudy; Peperstraete, J.

1994-10-01

This paper describes a scheduling method for hard real-time Digital Signal Processing (DSP) applications, implemented on a multi-processor. Due to the very high operating frequencies of DSP applications (typically hundreds of kHz) runtime overhead should be kept as small as possible. Because static scheduling introduces very little run-time overhead it is used as much as possible. Dynamic pre-emption of tasks is allowed if and only if it leads to better performance in spite of the extra run-time overhead. We essentially combine static scheduling with dynamic pre-emption using static priorities. Since we are dealing with hard real-time applications we must be able to guarantee at compile-time that all timing requirements will be satisfied at run-time. We will show that our method performs at least as good as any static scheduling method. It also reduces the total amount of dynamic pre-emptions compared with run time methods like deadline monotonic scheduling.
The Experimental Mathematician: The Pleasure of Discovery and the Role of Proof

ERIC Educational Resources Information Center

Borwein, Jonathan M.

2005-01-01

The emergence of powerful mathematical computing environments, the growing availability of correspondingly powerful (multi-processor) computers and the pervasive presence of the Internet allow for mathematicians, students and teachers, to proceed heuristically and "quasi-inductively." We may increasingly use symbolic and numeric computation,…
Experience with a UNIX based batch computing facility for H1

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gerhards, R.; Kruener-Marquis, U.; Szkutnik, Z.

1994-12-31

A UNIX based batch computing facility for the H1 experiment at DESY is described. The ultimate goal is to replace the DESY IBM mainframe by a multiprocessor SGI Challenge series computer, using the UNIX operating system, for most of the computing tasks in H1.
Model Checking, Abstraction, and Compositional Verification

DTIC Science & Technology

1993-07-01

the ( alois connections used by Bensalrnu el al. [6], and also has some relation to Kurshan’s automata homonuor- phisms [62]. (Actually. we can impose a...multiprocessor simulation model. ACM Transactions on Computer Systems, 4(4):273-298, November 1986. [41 D. L. Beatty, R. E. Bryant, and C.-J. Seger
Developing Software to Use Parallel Processing Effectively

DTIC Science & Technology

1988-10-01

Experience, Vol 15(6), June 1985, p53 Gajski85 Gajski , Daniel D. and Jih-Kwon Peir, "Essential Issues in Multiprocessor Systems", IEEE Computer, June...Treleaven (eds.), Springer-Verlag, pp. 213-225 (June 1987). Kuck83 David Kuck, Duncan Lawrie, Ron Cytron, Ahmed Sameh and Daniel Gajski , The Architecture and
Expert Systems on Multiprocessor Architectures. Volume 2. Technical Reports

DTIC Science & Technology

1991-06-01

Report RC 12936 (#58037). IBM T. J. Wartson Reiearch Center. July 1987. � Alan Jay Smith. Cache memories. Coniputing Sitrry., 1.1(3): I.3-5:30...basic-shared is an instrument for ashared memory design. The components panels are processor- qload-scrolling-bar-panel, memory-qload-scrolling-bar-panel
Nearest Neighbor Searching in Binary Search Trees: Simulation of a Multiprocessor System.

ERIC Educational Resources Information Center

Stewart, Mark; Willett, Peter

1987-01-01

Describes the simulation of a nearest neighbor searching algorithm for document retrieval using a pool of microprocessors. Three techniques are described which allow parallel searching of a binary search tree as well as a PASCAL-based system, PASSIM, which can simulate these techniques. Fifty-six references are provided. (Author/LRW)
Expert Systems on Multiprocessor Architectures. Volume 4. Technical Reports

DTIC Science & Technology

1991-06-01

Floated-Current-Time0 -> The time that this function is called in user time uflts, expressed as a floating point number. Halt- Poligono Arrests the...default a statistics file will be printed out, if it can be. To prevent this make No-Statistics true. Unhalt- Poligono Unarrests the process in which the
Clocking and Synchronization Circuits in Multiprocessor Systems

DTIC Science & Technology

1989-04-01

18 3.4 Inter -chip Clocking Strategies...may occur when two or more of the switches make transitions at different times during the inter - val during which those inputs are being processed...increased without any fruitful computation. The sources of the inter -chip clock skew are the electromagnetic propagation delay, the buffer delay within
Cache coherency without line exclusivity in MP systems having store-in caches

DOE Office of Scientific and Technical Information (OSTI.GOV)

Pomerene, J.H.; Puzak, T.R.; Rechtschaffen, R.N.

1983-11-01

By modifying the function of the storage control unit, a multiprocessor (MP) system having store-in caches is enabled to operate with the same versatility as an MP system having store-through caches, thereby eliminating the requirement for line exclusivity and greatly reducing the occurrence of cross-interrogates.
Modeling and simulation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hanham, R.; Vogt, W.G.; Mickle, M.H.

1986-01-01

This book presents the papers given at a conference on computerized simulation. Topics considered at the conference included expert systems, modeling in electric power systems, power systems operating strategies, energy analysis, a linear programming approach to optimum load shedding in transmission systems, econometrics, simulation in natural gas engineering, solar energy studies, artificial intelligence, vision systems, hydrology, multiprocessors, and flow models.
Hardware for Accelerating N-Modular Redundant Systems for High-Reliability Computing

NASA Technical Reports Server (NTRS)

Dobbs, Carl, Sr.

2012-01-01

A hardware unit has been designed that reduces the cost, in terms of performance and power consumption, for implementing N-modular redundancy (NMR) in a multiprocessor device. The innovation monitors transactions to memory, and calculates a form of sumcheck on-the-fly, thereby relieving the processors of calculating the sumcheck in software
Data Traffic Reduction Schemes for Cholesky Factorization on Asynchronous Multiprocessor Systems

DTIC Science & Technology

1989-06-01

Engineering NASA Langley Research Center Hampton, Virginia 23665-5225 Operated by the Universities Space Research Association DTIC ELECTE NASA jAUG 23...Hampton, VA 23665. ti- 1. Introduction Consider the problem of solving a system of linear equations Ax=b where .4 is an n x n symmetric, positive
Estimating Performance of Single Bus, Shared Memory Multiprocessors

DTIC Science & Technology

1987-05-01

Chandy78] K.M. Chandy, C.M. Sauer, "Approximate methods for analyzing queuing network models of computing systems," Computing Surveys, vol10 , no 3...Denning78] P. Denning, J. Buzen, "The operational analysis of queueing network models", Computing Sur- veys, vol10 , no 3, September 1978, pp 225-261
3D Navier-Stokes Flow Analysis for a Large-Array Multiprocessor

DTIC Science & Technology

1989-04-17

computer, Alliant’s FX /8, Intel’s Hypercube, and Encore’s Multimax. Unfortunately, the current algorithms have been developed pri- marily for SISD machines...Reversing and Thrust-Vectoring Nozzle Flows," Ph.D. Dissertation in the Dept. of Aero. and Astro ., Univ. of Wash., Washington, 1986. [11] Anderson
MULTIPROCESSOR AND DISTRIBUTED PROCESSING BIBLIOGRAPHIC DATA BASE SOFTWARE SYSTEM

NASA Technical Reports Server (NTRS)

Miya, E. N.

1994-01-01

Multiprocessors and distributed processing are undergoing increased scientific scrutiny for many reasons. It is more and more difficult to keep track of the existing research in these fields. This package consists of a large machine-readable bibliographic data base which, in addition to the usual keyword searches, can be used for producing citations, indexes, and cross-references. The data base is compiled from smaller existing multiprocessing bibliographies, and tables of contents from journals and significant conferences. There are approximately 4,000 entries covering topics such as parallel and vector processing, networks, supercomputers, fault-tolerant computers, and cellular automata. Each entry is represented by 21 fields including keywords, author, referencing book or journal title, volume and page number, and date and city of publication. The data base contains UNIX 'refer' formatted ASCII data and can be implemented on any computer running under the UNIX operating system. The data base requires approximately one megabyte of secondary storage. The documentation for this program is included with the distribution tape, although it can be purchased for the price below. This bibliography was compiled in 1985 and updated in 1988.
Design and control of a macro-micro robot for precise force applications

NASA Technical Reports Server (NTRS)

Wang, Yulun; Mangaser, Amante; Laby, Keith; Jordan, Steve; Wilson, Jeff

1993-01-01

Creating a robot which can delicately interact with its environment has been the goal of much research. Primarily two difficulties have made this goal hard to attain. The execution of control strategies which enable precise force manipulations are difficult to implement in real time because such algorithms have been too computationally complex for available controllers. Also, a robot mechanism which can quickly and precisely execute a force command is difficult to design. Actuation joints must be sufficiently stiff, frictionless, and lightweight so that desired torques can be accurately applied. This paper describes a robotic system which is capable of delicate manipulations. A modular high-performance multiprocessor control system was designed to provide sufficient compute power for executing advanced control methods. An 8 degree of freedom macro-micro mechanism was constructed to enable accurate tip forces. Control algorithms based on the impedance control method were derived, coded, and load balanced for maximum execution speed on the multiprocessor system. Delicate force tasks such as polishing, finishing, cleaning, and deburring, are the target applications of the robot.

Distributed Virtual System (DIVIRS) Project

NASA Technical Reports Server (NTRS)

Schorr, Herbert; Neuman, B. Clifford

1993-01-01

As outlined in our continuation proposal 92-ISI-50R (revised) on contract NCC 2-539, we are (1) developing software, including a system manager and a job manager, that will manage available resources and that will enable programmers to program parallel applications in terms of a virtual configuration of processors, hiding the mapping to physical nodes; (2) developing communications routines that support the abstractions implemented in item one; (3) continuing the development of file and information systems based on the virtual system model; and (4) incorporating appropriate security measures to allow the mechanisms developed in items 1 through 3 to be used on an open network. The goal throughout our work is to provide a uniform model that can be applied to both parallel and distributed systems. We believe that multiprocessor systems should exist in the context of distributed systems, allowing them to be more easily shared by those that need them. Our work provides the mechanisms through which nodes on multiprocessors are allocated to jobs running within the distributed system and the mechanisms through which files needed by those jobs can be located and accessed.
A parallel row-based algorithm with error control for standard-cell replacement on a hypercube multiprocessor

NASA Technical Reports Server (NTRS)

Sargent, Jeff Scott

1988-01-01

A new row-based parallel algorithm for standard-cell placement targeted for execution on a hypercube multiprocessor is presented. Key features of this implementation include a dynamic simulated-annealing schedule, row-partitioning of the VLSI chip image, and two novel new approaches to controlling error in parallel cell-placement algorithms; Heuristic Cell-Coloring and Adaptive (Parallel Move) Sequence Control. Heuristic Cell-Coloring identifies sets of noninteracting cells that can be moved repeatedly, and in parallel, with no buildup of error in the placement cost. Adaptive Sequence Control allows multiple parallel cell moves to take place between global cell-position updates. This feedback mechanism is based on an error bound derived analytically from the traditional annealing move-acceptance profile. Placement results are presented for real industry circuits and the performance is summarized of an implementation on the Intel iPSC/2 Hypercube. The runtime of this algorithm is 5 to 16 times faster than a previous program developed for the Hypercube, while producing equivalent quality placement. An integrated place and route program for the Intel iPSC/2 Hypercube is currently being developed.
DIstributed VIRtual System (DIVIRS) project

NASA Technical Reports Server (NTRS)

Schorr, Herbert; Neuman, B. Clifford

1994-01-01

As outlined in our continuation proposal 92-ISI-. OR (revised) on NASA cooperative agreement NCC2-539, we are (1) developing software, including a system manager and a job manager, that will manage available resources and that will enable programmers to develop and execute parallel applications in terms of a virtual configuration of processors, hiding the mapping to physical nodes; (2) developing communications routines that support the abstractions implemented in item one; (3) continuing the development of file and information systems based on the Virtual System Model; and (4) incorporating appropriate security measures to allow the mechanisms developed in items 1 through 3 to be used on an open network. The goal throughout our work is to provide a uniform model that can be applied to both parallel and distributed systems. We believe that multiprocessor systems should exist in the context of distributed systems, allowing them to be more easily shared by those that need them. Our work provides the mechanisms through which nodes on multiprocessors are allocated to jobs running within the distributed system and the mechanisms through which files needed by those jobs can be located and accessed.
DIstributed VIRtual System (DIVIRS) project

NASA Technical Reports Server (NTRS)

Schorr, Herbert; Neuman, Clifford B.

1995-01-01

As outlined in our continuation proposal 92-ISI-50R (revised) on NASA cooperative agreement NCC2-539, we are (1) developing software, including a system manager and a job manager, that will manage available resources and that will enable programmers to develop and execute parallel applications in terms of a virtual configuration of processors, hiding the mapping to physical nodes; (2) developing communications routines that support the abstractions implemented in item one; (3) continuing the development of file and information systems based on the Virtual System Model; and (4) incorporating appropriate security measures to allow the mechanisms developed in items 1 through 3 to be used on an open network. The goal throughout our work is to provide a uniform model that can be applied to both parallel and distributed systems. We believe that multiprocessor systems should exist in the context of distributed systems, allowing them to be more easily shared by those that need them. Our work provides the mechanisms through which nodes on multiprocessors are allocated to jobs running within the distributed system and the mechanisms through which files needed by those jobs can be located and accessed.
Distributed Virtual System (DIVIRS) project

NASA Technical Reports Server (NTRS)

Schorr, Herbert; Neuman, B. Clifford

1993-01-01

As outlined in the continuation proposal 92-ISI-50R (revised) on NASA cooperative agreement NCC 2-539, the investigators are developing software, including a system manager and a job manager, that will manage available resources and that will enable programmers to develop and execute parallel applications in terms of a virtual configuration of processors, hiding the mapping to physical nodes; developing communications routines that support the abstractions implemented; continuing the development of file and information systems based on the Virtual System Model; and incorporating appropriate security measures to allow the mechanisms developed to be used on an open network. The goal throughout the work is to provide a uniform model that can be applied to both parallel and distributed systems. The authors believe that multiprocessor systems should exist in the context of distributed systems, allowing them to be more easily shared by those that need them. The work provides the mechanisms through which nodes on multiprocessors are allocated to jobs running within the distributed system and the mechanisms through which files needed by those jobs can be located and accessed.
Prefetching in file systems for MIMD multiprocessors

NASA Technical Reports Server (NTRS)

Kotz, David F.; Ellis, Carla Schlatter

1990-01-01

The question of whether prefetching blocks on the file into the block cache can effectively reduce overall execution time of a parallel computation, even under favorable assumptions, is considered. Experiments have been conducted with an interleaved file system testbed on the Butterfly Plus multiprocessor. Results of these experiments suggest that (1) the hit ratio, the accepted measure in traditional caching studies, may not be an adequate measure of performance when the workload consists of parallel computations and parallel file access patterns, (2) caching with prefetching can significantly improve the hit ratio and the average time to perform an I/O (input/output) operation, and (3) an improvement in overall execution time has been observed in most cases. In spite of these gains, prefetching sometimes results in increased execution times (a negative result, given the optimistic nature of the study). The authors explore why it is not trivial to translate savings on individual I/O requests into consistently better overall performance and identify the key problems that need to be addressed in order to improve the potential of prefetching techniques in the environment.
Sparse Gaussian elimination with controlled fill-in on a shared memory multiprocessor

NASA Technical Reports Server (NTRS)

Alaghband, Gita; Jordan, Harry F.

1989-01-01

It is shown that in sparse matrices arising from electronic circuits, it is possible to do computations on many diagonal elements simultaneously. A technique for obtaining an ordered compatible set directly from the ordered incompatible table is given. The ordering is based on the Markowitz number of the pivot candidates. This technique generates a set of compatible pivots with the property of generating few fills. A novel heuristic algorithm is presented that combines the idea of an order-compatible set with a limited binary tree search to generate several sets of compatible pivots in linear time. An elimination set for reducing the matrix is generated and selected on the basis of a minimum Markowitz sum number. The parallel pivoting technique presented is a stepwise algorithm and can be applied to any submatrix of the original matrix. Thus, it is not a preordering of the sparse matrix and is applied dynamically as the decomposition proceeds. Parameters are suggested to obtain a balance between parallelism and fill-ins. Results of applying the proposed algorithms on several large application matrices using the HEP multiprocessor (Kowalik, 1985) are presented and analyzed.
Controlled replication: reduce the capacity occupied by redundant replicas in tiled chip multiprocessors

NASA Astrophysics Data System (ADS)

Li, Hao; Xie, Lunguo

2013-03-01

The design of cache system for Chip Multiprocessor (CMP) face many challenges because future CMPs will have more cores and greater on-chip cache capacity. There are two base design schemes about L2 cache: private scheme in which each L2 slice is treated as a private L2 cache and shared scheme in which all L2 slices are treated as a large L2 cache shared by all cores. Private caches provide the lowest hit latency but reduce the total effective cache capacity. A shared L2 cache increases the effective cache capacity but has long hit latencies when data is on a remote tile. This paper present a new Controlled Replication (CR) policy to reduce the capacities occupied by redundant shared replicas. the new CR policy increases the effective capacity than victim replication scheme and has lower hit latency than shared scheme. We evaluate the various schemes using full-system simulation of parallel applications. Results show that CR reduces the average memory access latency of shared scheme by an average of 13%, providing better overall performance than victim replication and shared schemes.
Avoiding and tolerating latency in large-scale next-generation shared-memory multiprocessors

NASA Technical Reports Server (NTRS)

Probst, David K.

1993-01-01

A scalable solution to the memory-latency problem is necessary to prevent the large latencies of synchronization and memory operations inherent in large-scale shared-memory multiprocessors from reducing high performance. We distinguish latency avoidance and latency tolerance. Latency is avoided when data is brought to nearby locales for future reference. Latency is tolerated when references are overlapped with other computation. Latency-avoiding locales include: processor registers, data caches used temporally, and nearby memory modules. Tolerating communication latency requires parallelism, allowing the overlap of communication and computation. Latency-tolerating techniques include: vector pipelining, data caches used spatially, prefetching in various forms, and multithreading in various forms. Relaxing the consistency model permits increased use of avoidance and tolerance techniques. Each model is a mapping from the program text to sets of partial orders on program operations; it is a convention about which temporal precedences among program operations are necessary. Information about temporal locality and parallelism constrains the use of avoidance and tolerance techniques. Suitable architectural primitives and compiler technology are required to exploit the increased freedom to reorder and overlap operations in relaxed models.
Engineering study for the functional design of a multiprocessor system

NASA Technical Reports Server (NTRS)

Miller, J. S.; Vandever, W. H.; Stanten, S. F.; Avakian, A. E.; Kosmala, A. L.

1972-01-01

The results are presented of a study to generate a functional system design of a multiprocessing computer system capable of satisfying the computational requirements of a space station. These data management system requirements were specified to include: (1) real time control, (2) data processing and storage, (3) data retrieval, and (4) remote terminal servicing.
Design Tools for Evaluating Multiprocessor Programs

DTIC Science & Technology

1976-07-01

than large uniprocessing machines, and 2. economies of scale in manufacturing. Perhaps the most compelling reason (possibly a consequence of the...speed, redundancy, (inefficiency, resource utilization, and economies of the components. [Browne 73, Lehman 66] 6. How can the system be scheduled...mejsures are interesting about the computation? Somn may be: speed, redundancy, (inefficiency, resource utilization, and economies of the components
Design of a modular digital computer system, DRL 4. [for meeting future requirements of spaceborne computers

NASA Technical Reports Server (NTRS)

1972-01-01

The design is reported of an advanced modular computer system designated the Automatically Reconfigurable Modular Multiprocessor System, which anticipates requirements for higher computing capacity and reliability for future spaceborne computers. Subjects discussed include: an overview of the architecture, mission analysis, synchronous and nonsynchronous scheduling control, reliability, and data transmission.
Process for predicting structural performance of mechanical systems

DOEpatents

Gardner, David R.; Hendrickson, Bruce A.; Plimpton, Steven J.; Attaway, Stephen W.; Heinstein, Martin W.; Vaughan, Courtenay T.

1998-01-01

A process for predicting the structural performance of a mechanical system represents the mechanical system by a plurality of surface elements. The surface elements are grouped according to their location in the volume occupied by the mechanical system so that contacts between surface elements can be efficiently located. The process is well suited for efficient practice on multiprocessor computers.
Random Walk Method for Potential Problems

NASA Technical Reports Server (NTRS)

Krishnamurthy, T.; Raju, I. S.

2002-01-01

A local Random Walk Method (RWM) for potential problems governed by Lapalace's and Paragon's equations is developed for two- and three-dimensional problems. The RWM is implemented and demonstrated in a multiprocessor parallel environment on a Beowulf cluster of computers. A speed gain of 16 is achieved as the number of processors is increased from 1 to 23.
A manual for PARTI runtime primitives

NASA Technical Reports Server (NTRS)

Berryman, Harry; Saltz, Joel

1990-01-01

Primitives are presented that are designed to help users efficiently program irregular problems (e.g., unstructured mesh sweeps, sparse matrix codes, adaptive mesh partial differential equations solvers) on distributed memory machines. These primitives are also designed for use in compilers for distributed memory multiprocessors. Communications patterns are captured at runtime, and the appropriate send and receive messages are automatically generated.
Software Epistemology

DTIC Science & Technology

2016-03-01

in-vitro decision to incubate a startup, Lexumo [7], which is developing a commercial Software as a Service ( SaaS ) vulnerability assessment...LTS Label Transition System MUSE Mining and Understanding Software Enclaves RTEMS Real-Time Executive for Multi-processor Systems SaaS Software ...as a Service SSA Static Single Assignment SWE Software Epistemology UD/DU Def-Use/Use-Def Chains (Dataflow Graph)
A C++ Thread Package for Concurrent and Parallel Programming

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jie Chen; William Watson

1999-11-01

Recently thread libraries have become a common entity on various operating systems such as Unix, Windows NT and VxWorks. Those thread libraries offer significant performance enhancement by allowing applications to use multiple threads running either concurrently or in parallel on multiprocessors. However, the incompatibilities between native libraries introduces challenges for those who wish to develop portable applications.
Attacking the One-Out-Of-m Multicore Problem by Combining Hardware Management with Mixed-Criticality Provisioning

DTIC Science & Technology

2015-05-01

LLC and DRAM banks. For each µB task and isolation configuration, we ran experiments with all 256 possible LLC area sizes (given by 1 to 16 ways and 1...isolation on multicoore platforms. In RTAS ’14. [29] H. Yun, G. Yao, R. Pellizzoni, M. Caccamo, and L. Sha . Memory access control in multiprocessor
Process for predicting structural performance of mechanical systems

DOEpatents

Gardner, D.R.; Hendrickson, B.A.; Plimpton, S.J.; Attaway, S.W.; Heinstein, M.W.; Vaughan, C.T.

1998-05-19

A process for predicting the structural performance of a mechanical system represents the mechanical system by a plurality of surface elements. The surface elements are grouped according to their location in the volume occupied by the mechanical system so that contacts between surface elements can be efficiently located. The process is well suited for efficient practice on multiprocessor computers. 12 figs.
Cache directory lookup reader set encoding for partial cache line speculation support

DOEpatents

Gara, Alan; Ohmacht, Martin

2014-10-21

In a multiprocessor system, with conflict checking implemented in a directory lookup of a shared cache memory, a reader set encoding permits dynamic recordation of read accesses. The reader set encoding includes an indication of a portion of a line read, for instance by indicating boundaries of read accesses. Different encodings may apply to different types of speculative execution.

Lattice Boltzmann Methods for Fluid Structure Interaction

DTIC Science & Technology

2012-09-01

MONTEREY, CALIFORNIA DISSERTATION LATTICE BOLTZMANN METHODS FOR FLUID STRUCTURE INTERACTION by Stuart R. Blair September 2012 Dissertation Supervisor...200 words) The use of lattice Boltzmann methods (LBM) for fluid flow and its coupling with finite element method (FEM) structural models for fluid... structure interaction (FSI) is investigated. A body of high performance LBM software that exploits graphic processing unit (GPU) and multiprocessor
A manual for PARTI runtime primitives, revision 1

NASA Technical Reports Server (NTRS)

Das, Raja; Saltz, Joel; Berryman, Harry

1991-01-01

Primitives are presented that are designed to help users efficiently program irregular problems (e.g., unstructured mesh sweeps, sparse matrix codes, adaptive mesh partial differential equations solvers) on distributed memory machines. These primitives are also designed for use in compilers for distributed memory multiprocessors. Communications patterns are captured at runtime, and the appropriate send and receive messages are automatically generated.
Proceedings of the international conference on cybernetics and societ

DOE Office of Scientific and Technical Information (OSTI.GOV)

Not Available

1985-01-01

This book presents the papers given at a conference on artificial intelligence, expert systems and knowledge bases. Topics considered at the conference included automating expert system development, modeling expert systems, causal maps, data covariances, robot vision, image processing, multiprocessors, parallel processing, VLSI structures, man-machine systems, human factors engineering, cognitive decision analysis, natural language, computerized control systems, and cybernetics.
Memory management and compiler support for rapid recovery from failures in computer systems

NASA Technical Reports Server (NTRS)

Fuchs, W. K.

1991-01-01

This paper describes recent developments in the use of memory management and compiler technology to support rapid recovery from failures in computer systems. The techniques described include cache coherence protocols for user transparent checkpointing in multiprocessor systems, compiler-based checkpoint placement, compiler-based code modification for multiple instruction retry, and forward recovery in distributed systems utilizing optimistic execution.
Large scale multiprocessor

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gajski, D.; Kuck, D.; Lawrie, D.

1983-03-01

The primary goal of the Cedar project is to demonstrate that supercomputers of the future can exhibit general purpose behavior and be easy to use. The Cedar project is based on five key developments which have reached fruition in the past year and taken together offer a comprehensive solution to these problems. The author looks at this project, and how its goals are being met.
Scheduling for energy and reliability management on multiprocessor real-time systems

NASA Astrophysics Data System (ADS)

Qi, Xuan

Scheduling algorithms for multiprocessor real-time systems have been studied for years with many well-recognized algorithms proposed. However, it is still an evolving research area and many problems remain open due to their intrinsic complexities. With the emergence of multicore processors, it is necessary to re-investigate the scheduling problems and design/develop efficient algorithms for better system utilization, low scheduling overhead, high energy efficiency, and better system reliability. Focusing cluster schedulings with optimal global schedulers, we study the utilization bound and scheduling overhead for a class of cluster-optimal schedulers. Then, taking energy/power consumption into consideration, we developed energy-efficient scheduling algorithms for real-time systems, especially for the proliferating embedded systems with limited energy budget. As the commonly deployed energy-saving technique (e.g. dynamic voltage frequency scaling (DVFS)) will significantly affect system reliability, we study schedulers that have intelligent mechanisms to recuperate system reliability to satisfy the quality assurance requirements. Extensive simulation is conducted to evaluate the performance of the proposed algorithms on reduction of scheduling overhead, energy saving, and reliability improvement. The simulation results show that the proposed reliability-aware power management schemes could preserve the system reliability while still achieving substantial energy saving.
Accelerating the Gillespie Exact Stochastic Simulation Algorithm using hybrid parallel execution on graphics processing units.

PubMed

Komarov, Ivan; D'Souza, Roshan M

2012-01-01

The Gillespie Stochastic Simulation Algorithm (GSSA) and its variants are cornerstone techniques to simulate reaction kinetics in situations where the concentration of the reactant is too low to allow deterministic techniques such as differential equations. The inherent limitations of the GSSA include the time required for executing a single run and the need for multiple runs for parameter sweep exercises due to the stochastic nature of the simulation. Even very efficient variants of GSSA are prohibitively expensive to compute and perform parameter sweeps. Here we present a novel variant of the exact GSSA that is amenable to acceleration by using graphics processing units (GPUs). We parallelize the execution of a single realization across threads in a warp (fine-grained parallelism). A warp is a collection of threads that are executed synchronously on a single multi-processor. Warps executing in parallel on different multi-processors (coarse-grained parallelism) simultaneously generate multiple trajectories. Novel data-structures and algorithms reduce memory traffic, which is the bottleneck in computing the GSSA. Our benchmarks show an 8×-120× performance gain over various state-of-the-art serial algorithms when simulating different types of models.
Software Coherence in Multiprocessor Memory Systems. Ph.D. Thesis

NASA Technical Reports Server (NTRS)

Bolosky, William Joseph

1993-01-01

Processors are becoming faster and multiprocessor memory interconnection systems are not keeping up. Therefore, it is necessary to have threads and the memory they access as near one another as possible. Typically, this involves putting memory or caches with the processors, which gives rise to the problem of coherence: if one processor writes an address, any other processor reading that address must see the new value. This coherence can be maintained by the hardware or with software intervention. Systems of both types have been built in the past; the hardware-based systems tended to outperform the software ones. However, the ratio of processor to interconnect speed is now so high that the extra overhead of the software systems may no longer be significant. This issue is explored both by implementing a software maintained system and by introducing and using the technique of offline optimal analysis of memory reference traces. It finds that in properly built systems, software maintained coherence can perform comparably to or even better than hardware maintained coherence. The architectural features necessary for efficient software coherence to be profitable include a small page size, a fast trap mechanism, and the ability to execute instructions while remote memory references are outstanding.
Partitioning in parallel processing of production systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Oflazer, K.

1987-01-01

This thesis presents research on certain issues related to parallel processing of production systems. It first presents a parallel production system interpreter that has been implemented on a four-processor multiprocessor. This parallel interpreter is based on Forgy's OPS5 interpreter and exploits production-level parallelism in production systems. Runs on the multiprocessor system indicate that it is possible to obtain speed-up of around 1.7 in the match computation for certain production systems when productions are split into three sets that are processed in parallel. The next issue addressed is that of partitioning a set of rules to processors in a parallel interpretermore » with production-level parallelism, and the extent of additional improvement in performance. The partitioning problem is formulated and an algorithm for approximate solutions is presented. The thesis next presents a parallel processing scheme for OPS5 production systems that allows some redundancy in the match computation. This redundancy enables the processing of a production to be divided into units of medium granularity each of which can be processed in parallel. Subsequently, a parallel processor architecture for implementing the parallel processing algorithm is presented.« less
Recent Progress on the Parallel Implementation of Moving-Body Overset Grid Schemes

NASA Technical Reports Server (NTRS)

Wissink, Andrew; Allen, Edwin (Technical Monitor)

1998-01-01

Viscous calculations about geometrically complex bodies in which there is relative motion between component parts is one of the most computationally demanding problems facing CFD researchers today. This presentation documents results from the first two years of a CHSSI-funded effort within the U.S. Army AFDD to develop scalable dynamic overset grid methods for unsteady viscous calculations with moving-body problems. The first pan of the presentation will focus on results from OVERFLOW-D1, a parallelized moving-body overset grid scheme that employs traditional Chimera methodology. The two processes that dominate the cost of such problems are the flow solution on each component and the intergrid connectivity solution. Parallel implementations of the OVERFLOW flow solver and DCF3D connectivity software are coupled with a proposed two-part static-dynamic load balancing scheme and tested on the IBM SP and Cray T3E multi-processors. The second part of the presentation will cover some recent results from OVERFLOW-D2, a new flow solver that employs Cartesian grids with various levels of refinement, facilitating solution adaption. A study of the parallel performance of the scheme on large distributed- memory multiprocessor computer architectures will be reported.
A Performance Prediction Model for a Fault-Tolerant Computer During Recovery and Restoration

NASA Technical Reports Server (NTRS)

Obando, Rodrigo A.; Stoughton, John W.

1995-01-01

The modeling and design of a fault-tolerant multiprocessor system is addressed. Of interest is the behavior of the system during recovery and restoration after a fault has occurred. The multiprocessor systems are based on the Algorithm to Architecture Mapping Model (ATAMM) and the fault considered is the death of a processor. The developed model is useful in the determination of performance bounds of the system during recovery and restoration. The performance bounds include time to recover from the fault, time to restore the system, and determination of any permanent delay in the input to output latency after the system has regained steady state. Implementation of an ATAMM based computer was developed for a four-processor generic VHSIC spaceborne computer (GVSC) as the target system. A simulation of the GVSC was also written on the code used in the ATAMM Multicomputer Operating System (AMOS). The simulation is used to verify the new model for tracking the propagation of the delay through the system and predicting the behavior of the transient state of recovery and restoration. The model is shown to accurately predict the transient behavior of an ATAMM based multicomputer during recovery and restoration.
Systems analysis of the space shuttle. [communication systems, computer systems, and power distribution

NASA Technical Reports Server (NTRS)

Schilling, D. L.; Oh, S. J.; Thau, F.

1975-01-01

Developments in communications systems, computer systems, and power distribution systems for the space shuttle are described. The use of high speed delta modulation for bit rate compression in the transmission of television signals is discussed. Simultaneous Multiprocessor Organization, an approach to computer organization, is presented. Methods of computer simulation and automatic malfunction detection for the shuttle power distribution system are also described.
Laboratory for Computer Science Progress Report 19, 1 July 1981-30 June 1982.

DTIC Science & Technology

1984-05-01

Multiprocessor Architectures 202 4. TRIX Operating System 209 5. VLSI Tools 212 ’SYSTEMATIC PROGRAM DEVELOPMENT, 221 1. Introduction 222 2. Specification...exploring distributed operating systems and the architecture of single-user powerful computers that are interconnected by communication networks. The...to now. In particular, we expect to experiment with languages, operating systems , and applications that establish the feasibility of distributed
Expert Systems on Multiprocessor Architectures. Phase 1

DTIC Science & Technology

1988-08-01

great rate) as early experience indicates what alternative aspect of system operation should have been monitored in any given completed run. The... system operation should have been monitored in any given completed run. The design goals that emerged then were (1) that the simulation system should...ORGANIZATION 6b OFFICE SYMBOL 7a. NAME OF MONITORING ORGANIZATION Stanford University (If applicable) Knowledge Systems Laboratory Rome Air Development
Specification and Verification of Secure Concurrent and Distributed Software Systems

DTIC Science & Technology

1992-02-01

primitive search strategies work for operating systems that contain relatively few operations . As the number of operations increases, so does the the...others have granted him access to, etc . The burden of security falls on the operating system , although appropriate hardware support can minimize the...Guttag, J. Horning, and R. Levin. Synchronization primitives for a multiprocessor: a formal specification. Symposium on Operating System Principles
Fault-Tolerant Multiprocessor and VLSI-Based Systems.

DTIC Science & Technology

1987-03-15

54590 170 Table 1: Statistics for the Benchmark Programs pages are distributed amongst the groups of the reconfigured memory in proportion to the...distances are proportional to only the logarithm of the sure that possesses relevance to a system which consists of alare nmbe ofhomgenouseleent...and comn.unication overhead resulting from faults communicating with all of the other elements in the system the network to degrade proportionately to
Domain decomposition preconditioners for the spectral collocation method

NASA Technical Reports Server (NTRS)

Quarteroni, Alfio; Sacchilandriani, Giovanni

1988-01-01

Several block iteration preconditioners are proposed and analyzed for the solution of elliptic problems by spectral collocation methods in a region partitioned into several rectangles. It is shown that convergence is achieved with a rate which does not depend on the polynomial degree of the spectral solution. The iterative methods here presented can be effectively implemented on multiprocessor systems due to their high degree of parallelism.
Comparisons between Intel 386 and i486 microprecessors

NASA Technical Reports Server (NTRS)

Liu, Yuan-Kwei

1989-01-01

A quick and preliminary comparison is made between the Intel 386 and i486 microprocessors. The following topics are discussed: the i486 key elements, comparison of instruction set architecture, the i486 on-chip cache characteristics, the i486 multiprocessor support, comparison of performance, comparison of power consumption, comparison of radiation hardening potential, and recommendations for the Space Station Freedom (SSF) Data Management System (DMS).
Address tracing for parallel machines

NASA Technical Reports Server (NTRS)

Stunkel, Craig B.; Janssens, Bob; Fuchs, W. Kent

1991-01-01

Recently implemented parallel system address-tracing methods based on several metrics are surveyed. The issues specific to collection of traces for both shared and distributed memory parallel computers are highlighted. Five general categories of address-trace collection methods are examined: hardware-captured, interrupt-based, simulation-based, altered microcode-based, and instrumented program-based traces. The problems unique to shared memory and distributed memory multiprocessors are examined separately.
Graphical Representation of Parallel Algorithmic Processes

DTIC Science & Technology

1990-12-01

interface with the AAARF main process . The source code for the AAARF class-common library is in the common subdi- rectory and consists of the following files... for public release; distribution unlimited AFIT/GCE/ENG/90D-07 Graphical Representation of Parallel Algorithmic Processes THESIS Presented to the...goal of this study is to develop an algorithm animation facility for parallel processes executing on different architectures, from multiprocessor

The Triangle: a Multiprocessor Architecture for Fast Curve and Surface Generation.

DTIC Science & Technology

1987-08-01

design , curves and surfaces, graphics hardware. 20...curves, B-splines, computer-aided geometric design ; curves and sur- faces, graphics hardware. (k 12). -/ .... This work was supported in part by the...34 Electronic Design , October 30, 1986. 21. M. A. Penna and R. R. Patterson, Projective Geometry and its Applications to Computer Graphics , Prentice-Hall, Englewood Cliffs, N.J., 1985. 70,e, 41100vr -~ ~ - -- --
Information Processing Research.

DTIC Science & Technology

1988-05-01

concentrated mainly on the Hitech chess machine, which achieves its success from parallelism in the right places. Hitech has now reached a National rating...includes local user workstations, a set of central server workstations each acting as a host for a Warp machine, and a few Warp multiprocessors. The... successful completion. A quorum for an operation is any such set of sites. Neces- sary and sufficient constraints on quorum intersections are derived
A divide and conquer approach to the nonsymmetric eigenvalue problem

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jessup, E.R.

1991-01-01

Serial computation combined with high communication costs on distributed-memory multiprocessors make parallel implementations of the QR method for the nonsymmetric eigenvalue problem inefficient. This paper introduces an alternative algorithm for the nonsymmetric tridiagonal eigenvalue problem based on rank two tearing and updating of the matrix. The parallelism of this divide and conquer approach stems from independent solution of the updating problems. 11 refs.
Distributed processing for features improvement in real-time portable medical devices.

PubMed

Mercado, Erwin John Saavedra

2008-01-01

Portable biomedical devices are being developed and incorporated in daily life. Nevertheless, their standalone capacity is diminished due to the lack of processing power required to face such duties as for example, signal artifacts robustness in EKG monitor devices. The following paper presents a multiprocessor architecture made from simple microcontrollers to provide an increase in processing performance, power consumption efficiency and lower cost.
USSR Report: Cybernetics, Computers and Automation Technology. No. 69.

DTIC Science & Technology

1983-05-06

computers in multiprocessor and multistation design , control and scientific research automation systems. The results of comparing the efficiency of...Podvizhnaya, Scientific Research Institute of Control Computers, Severodonetsk] [Text] The most significant change in the design of the SM-2M compared to...UPRAVLYAYUSHCHIYE SISTEMY I MASHINY, Nov-Dec 82) 95 APPLICATIONS Kiev Automated Control System, Design Features and Prospects for Development (V. A
Image Understanding and Intelligent Parallel Systems

DTIC Science & Technology

1991-05-09

a common user interface for the interactive , graphical manipulation of those histories, and...Circuits and Systems, August 1987. Yap, S.-K. and M.L. Scott, "PenGuin: A language for reactive graphical user interface programming," to appear, INTERACT 󈨞, Cambridge, United Kingdom, 1990. 74 ...of up to a factor of 100 over single-workstation implementations. User interfaces to large multiprocessor computers are a difficult issue addressed
Cache directory look-up re-use as conflict check mechanism for speculative memory requests

DOEpatents

Ohmacht, Martin

2013-09-10

In a cache memory, energy and other efficiencies can be realized by saving a result of a cache directory lookup for sequential accesses to a same memory address. Where the cache is a point of coherence for speculative execution in a multiprocessor system, with directory lookups serving as the point of conflict detection, such saving becomes particularly advantageous.
Advanced Numerical Techniques of Performance Evaluation. Volume 2

DTIC Science & Technology

1990-06-01

multiprocessor environment. This factor is determined by the overhead of the primitives available in the system ( semaphore , monitor , or message... semaphore , monitor , or message passing primitives ) and U the programming ability of the user who implements the simulation. " t,: the sequential...Warp Operating System . i Pro" lftevcnth ACM Symposum on Operating Systems Princlplcs, pages 77 9:3, Auslin, TX, Nov wicr 1987. ACM. [121 D.R. Jefferson
A Heterogeneous Multiprocessor Graphics System Using Processor-Enhanced Memories

DTIC Science & Technology

1989-02-01

frames per second, font generation directly from conic spline descriptions, and rapid calculation of radiosity form factors. The hardware consists of...generality for rendering curved surfaces, volume data, objects dcscri id with Constructive Solid Geometry, for rendering scenes using the radiosity ...f.aces and for computing a spherical radiosity lighting model (see Section 7.6). Custom Memory Chips \\ 208 bits x 128 pixels - Renderer Board ix p o a
High-performance computing — an overview

NASA Astrophysics Data System (ADS)

Marksteiner, Peter

1996-08-01

An overview of high-performance computing (HPC) is given. Different types of computer architectures used in HPC are discussed: vector supercomputers, high-performance RISC processors, various parallel computers like symmetric multiprocessors, workstation clusters, massively parallel processors. Software tools and programming techniques used in HPC are reviewed: vectorizing compilers, optimization and vector tuning, optimization for RISC processors; parallel programming techniques like shared-memory parallelism, message passing and data parallelism; and numerical libraries.
A Common Interface Real-Time Multiprocessor Operating System for Embedded Systems

DTIC Science & Technology

1991-03-04

Pressman , a design methodology should show hierarchical organization, lead to modules exhibiting independent functional characteristics, and be derived...Boehm, Barry W. "Software Engineering," Tutorial: Software Design Strategies, 2nd Edition. 35-50. Los Angeles CA: IEEE Computer Society Press, 1981... Pressman , Roger S. Software Engineering: A Practitioner’s Approach, Second Edi- tion. McGraw-Hill Book Company, New York, 1988. 59. Quinn, Michael J
CELLFS: TAKING THE "DMA" OUT OF CELL PROGRAMMING

DOE Office of Scientific and Technical Information (OSTI.GOV)

IONKOV, LATCHESAR A.; MIRTCHOVSKI, ANDREY A.; NYRHINEN, AKI M.

In this paper we present a new programming model for the Cell BE architecture of scalar multiprocessors. They call this programming model CellFS. CellFS aims at simplifying the task of managing I/O between the local store of the processing units and main memory. The CellFS support library provides the means for transferring data via simple file I/O operations between the PPU and the SPU.
A Model and Algorithms For a Software Evolution Control System

DTIC Science & Technology

1993-12-01

dynamic scheduling approaches can be found in [67). Task scheduling can also be characterized as preemptive and nonpreemptive . A task is preemptive ...is NP-hard for both the preemptive and nonpreemptive cases [671 [84). Scheduling nonpreemptive tasks with arbitrary ready times is NP-hard in both...the preemptive and nonpreemptive cases [671 [841. Scheduling nonpreemptive tasks with arbitrary ready times is NP-hard in both multiprocessor and
Information Processing Research

DTIC Science & Technology

1988-01-01

the Hitech chess machine, which achieves its success from parallelism in the right places. Hitech has now reached a National rating of 2359, making it...outset that success depended on building real systems and subjecting them to use by a large number of faculty and students within the Department. We...central server workstations each acting as a host for a Warp machine, and a few Warp multiprocessors. The command interpreter is executed in Lisp on
Force user's manual, revised

NASA Technical Reports Server (NTRS)

Jordan, Harry F.; Benten, Muhammad S.; Arenstorf, Norbert S.; Ramanan, Aruna V.

1987-01-01

A methodology for writing parallel programs for shared memory multiprocessors has been formalized as an extension to the Fortran language and implemented as a macro preprocessor. The extended language is known as the Force, and this manual describes how to write Force programs and execute them on the Flexible Computer Corporation Flex/32, the Encore Multimax and the Sequent Balance computers. The parallel extension macros are described in detail, but knowledge of Fortran is assumed.
Research in Structures and Dynamics, 1984

NASA Technical Reports Server (NTRS)

Hayduk, R. J. (Compiler); Noor, A. K. (Compiler)

1984-01-01

A symposium on advanced and trends in structures and dynamics was held to communicate new insights into physical behavior and to identify trends in the solution procedures for structures and dynamics problems. Pertinent areas of concern were (1) multiprocessors, parallel computation, and database management systems, (2) advances in finite element technology, (3) interactive computing and optimization, (4) mechanics of materials, (5) structural stability, (6) dynamic response of structures, and (7) advanced computer applications.
VLSI Based Multiprocessor Communications Networks.

DTIC Science & Technology

1982-09-01

year of the contract. Research plans for year three are also presented. Need for a research effort in the area of VLSI based communication networks... plans for year three of the contract. Section 4 concludes with a summary discussion of the research thus far. A number of appendices follow the main...pin constraints. We plan to investigate some -12- of these issues during the coming year in addition to developing similar models and bandwidth
Communications Patterns in a Symbolic Multiprocessor.

DTIC Science & Technology

1987-06-01

instruction references that Multilisp programs make. The cache hit ratio is greatest when instruction references have a high degree of -- locality. Another...future touches hit an undetermined future. N, The only exception is Consim, in which one third of future touches hit unde- termined futures. Task...Cambridge, MA, June 1985. [52] S. Sugimoto, K. Agusa, K. Tabata , and Y. Ohno. A multi-microprocessor system for concurrent Lisp. In Proceedings of
Principles for problem aggregation and assignment in medium scale multiprocessors

NASA Technical Reports Server (NTRS)

Nicol, David M.; Saltz, Joel H.

1987-01-01

One of the most important issues in parallel processing is the mapping of workload to processors. This paper considers a large class of problems having a high degree of potential fine grained parallelism, and execution requirements that are either not predictable, or are too costly to predict. The main issues in mapping such a problem onto medium scale multiprocessors are those of aggregation and assignment. We study a method of parameterized aggregation that makes few assumptions about the workload. The mapping of aggregate units of work onto processors is uniform, and exploits locality of workload intensity to balance the unknown workload. In general, a finer aggregate granularity leads to a better balance at the price of increased communication/synchronization costs; the aggregation parameters can be adjusted to find a reasonable granularity. The effectiveness of this scheme is demonstrated on three model problems: an adaptive one-dimensional fluid dynamics problem with message passing, a sparse triangular linear system solver on both a shared memory and a message-passing machine, and a two-dimensional time-driven battlefield simulation employing message passing. Using the model problems, the tradeoffs are studied between balanced workload and the communication/synchronization costs. Finally, an analytical model is used to explain why the method balances workload and minimizes the variance in system behavior.
Mobile Thread Task Manager

NASA Technical Reports Server (NTRS)

Clement, Bradley J.; Estlin, Tara A.; Bornstein, Benjamin J.

2013-01-01

The Mobile Thread Task Manager (MTTM) is being applied to parallelizing existing flight software to understand the benefits and to develop new techniques and architectural concepts for adapting software to multicore architectures. It allocates and load-balances tasks for a group of threads that migrate across processors to improve cache performance. In order to balance-load across threads, the MTTM augments a basic map-reduce strategy to draw jobs from a global queue. In a multicore processor, memory may be "homed" to the cache of a specific processor and must be accessed from that processor. The MTTB architecture wraps access to data with thread management to move threads to the home processor for that data so that the computation follows the data in an attempt to avoid L2 cache misses. Cache homing is also handled by a memory manager that translates identifiers to processor IDs where the data will be homed (according to rules defined by the user). The user can also specify the number of threads and processors separately, which is important for tuning performance for different patterns of computation and memory access. MTTM efficiently processes tasks in parallel on a multiprocessor computer. It also provides an interface to make it easier to adapt existing software to a multiprocessor environment.

The Design and Evaluation of "CAPTools"--A Computer Aided Parallelization Toolkit

NASA Technical Reports Server (NTRS)

Yan, Jerry; Frumkin, Michael; Hribar, Michelle; Jin, Haoqiang; Waheed, Abdul; Johnson, Steve; Cross, Jark; Evans, Emyr; Ierotheou, Constantinos; Leggett, Pete;

1998-01-01

Writing applications for high performance computers is a challenging task. Although writing code by hand still offers the best performance, it is extremely costly and often not very portable. The Computer Aided Parallelization Tools (CAPTools) are a toolkit designed to help automate the mapping of sequential FORTRAN scientific applications onto multiprocessors. CAPTools consists of the following major components: an inter-procedural dependence analysis module that incorporates user knowledge; a 'self-propagating' data partitioning module driven via user guidance; an execution control mask generation and optimization module for the user to fine tune parallel processing of individual partitions; a program transformation/restructuring facility for source code clean up and optimization; a set of browsers through which the user interacts with CAPTools at each stage of the parallelization process; and a code generator supporting multiple programming paradigms on various multiprocessors. Besides describing the rationale behind the architecture of CAPTools, the parallelization process is illustrated via case studies involving structured and unstructured meshes. The programming process and the performance of the generated parallel programs are compared against other programming alternatives based on the NAS Parallel Benchmarks, ARC3D and other scientific applications. Based on these results, a discussion on the feasibility of constructing architectural independent parallel applications is presented.

An Adaptive Insertion and Promotion Policy for Partitioned Shared Caches

NASA Astrophysics Data System (ADS)

Mahrom, Norfadila; Liebelt, Michael; Raof, Rafikha Aliana A.; Daud, Shuhaizar; Hafizah Ghazali, Nur

2018-03-01

Cache replacement policies in chip multiprocessors (CMP) have been investigated extensively and proven able to enhance shared cache management. However, competition among multiple processors executing different threads that require simultaneous access to a shared memory may cause cache contention and memory coherence problems on the chip. These issues also exist due to some drawbacks of the commonly used Least Recently Used (LRU) policy employed in multiprocessor systems, which are because of the cache lines residing in the cache longer than required. In image processing analysis of for example extra pulmonary tuberculosis (TB), an accurate diagnosis for tissue specimen is required. Therefore, a fast and reliable shared memory management system to execute algorithms for processing vast amount of specimen image is needed. In this paper, the effects of the cache replacement policy in a partitioned shared cache are investigated. The goal is to quantify whether better performance can be achieved by using less complex replacement strategies. This paper proposes a Middle Insertion 2 Positions Promotion (MI2PP) policy to eliminate cache misses that could adversely affect the access patterns and the throughput of the processors in the system. The policy employs a static predefined insertion point, near distance promotion, and the concept of ownership in the eviction policy to effectively improve cache thrashing and to avoid resource stealing among the processors.
Algorithms and architecture for multiprocessor based circuit simulation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Deutsch, J.T.

Accurate electrical simulation is critical to the design of high performance integrated circuits. Logic simulators can verify function and give first-order timing information. Switch level simulators are more effective at dealing with charge sharing than standard logic simulators, but cannot provide accurate timing information or discover DC problems. Delay estimation techniques and cell level simulation can be used in constrained design methods, but must be tuned for each application, and circuit simulation must still be used to generate the cell models. None of these methods has the guaranteed accuracy that many circuit designers desire, and none can provide detailed waveformmore » information. Detailed electrical-level simulation can predict circuit performance if devices and parasitics are modeled accurately. However, the computational requirements of conventional circuit simulators make it impractical to simulate current large circuits. In this dissertation, the implementation of Iterated Timing Analysis (ITA), a relaxation-based technique for accurate circuit simulation, on a special-purpose multiprocessor is presented. The ITA method is an SOR-Newton, relaxation-based method which uses event-driven analysis and selective trace to exploit the temporal sparsity of the electrical network. Because event-driven selective trace techniques are employed, this algorithm lends itself to implementation on a data-driven computer.« less
Efficient Approximation Algorithms for Weighted $b$-Matching

DOE Office of Scientific and Technical Information (OSTI.GOV)

Khan, Arif; Pothen, Alex; Mostofa Ali Patwary, Md.

2016-01-01

We describe a half-approximation algorithm, b-Suitor, for computing a b-Matching of maximum weight in a graph with weights on the edges. b-Matching is a generalization of the well-known Matching problem in graphs, where the objective is to choose a subset of M edges in the graph such that at most a specified number b(v) of edges in M are incident on each vertex v. Subject to this restriction we maximize the sum of the weights of the edges in M. We prove that the b-Suitor algorithm computes the same b-Matching as the one obtained by the greedy algorithm for themore » problem. We implement the algorithm on serial and shared-memory parallel processors, and compare its performance against a collection of approximation algorithms that have been proposed for the Matching problem. Our results show that the b-Suitor algorithm outperforms the Greedy and Locally Dominant edge algorithms by one to two orders of magnitude on a serial processor. The b-Suitor algorithm has a high degree of concurrency, and it scales well up to 240 threads on a shared memory multiprocessor. The b-Suitor algorithm outperforms the Locally Dominant edge algorithm by a factor of fourteen on 16 cores of an Intel Xeon multiprocessor.« less
Parallel solution of the symmetric tridiagonal eigenproblem. Research report

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jessup, E.R.

1989-10-01

This thesis discusses methods for computing all eigenvalues and eigenvectors of a symmetric tridiagonal matrix on a distributed-memory Multiple Instruction, Multiple Data multiprocessor. Only those techniques having the potential for both high numerical accuracy and significant large-grained parallelism are investigated. These include the QL method or Cuppen's divide and conquer method based on rank-one updating to compute both eigenvalues and eigenvectors, bisection to determine eigenvalues and inverse iteration to compute eigenvectors. To begin, the methods are compared with respect to computation time, communication time, parallel speed up, and accuracy. Experiments on an IPSC hypercube multiprocessor reveal that Cuppen's method ismore » the most accurate approach, but bisection with inverse iteration is the fastest and most parallel. Because the accuracy of the latter combination is determined by the quality of the computed eigenvectors, the factors influencing the accuracy of inverse iteration are examined. This includes, in part, statistical analysis of the effect of a starting vector with random components. These results are used to develop an implementation of inverse iteration producing eigenvectors with lower residual error and better orthogonality than those generated by the EISPACK routine TINVIT. This thesis concludes with adaptions of methods for the symmetric tridiagonal eigenproblem to the related problem of computing the singular value decomposition (SVD) of a bidiagonal matrix.« less
Parallel solution of the symmetric tridiagonal eigenproblem

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jessup, E.R.

1989-01-01

This thesis discusses methods for computing all eigenvalues and eigenvectors of a symmetric tridiagonal matrix on a distributed memory MIMD multiprocessor. Only those techniques having the potential for both high numerical accuracy and significant large-grained parallelism are investigated. These include the QL method or Cuppen's divide and conquer method based on rank-one updating to compute both eigenvalues and eigenvectors, bisection to determine eigenvalues, and inverse iteration to compute eigenvectors. To begin, the methods are compared with respect to computation time, communication time, parallel speedup, and accuracy. Experiments on an iPSC hyper-cube multiprocessor reveal that Cuppen's method is the most accuratemore » approach, but bisection with inverse iteration is the fastest and most parallel. Because the accuracy of the latter combination is determined by the quality of the computed eigenvectors, the factors influencing the accuracy of inverse iteration are examined. This includes, in part, statistical analysis of the effects of a starting vector with random components. These results are used to develop an implementation of inverse iteration producing eigenvectors with lower residual error and better orthogonality than those generated by the EISPACK routine TINVIT. This thesis concludes with adaptations of methods for the symmetric tridiagonal eigenproblem to the related problem of computing the singular value decomposition (SVD) of a bidiagonal matrix.« less
Ordered fast fourier transforms on a massively parallel hypercube multiprocessor

NASA Technical Reports Server (NTRS)

Tong, Charles; Swarztrauber, Paul N.

1989-01-01

Design alternatives for ordered Fast Fourier Transformation (FFT) algorithms were examined on massively parallel hypercube multiprocessors such as the Connection Machine. Particular emphasis is placed on reducing communication which is known to dominate the overall computing time. To this end, the order and computational phases of the FFT were combined, and the sequence to processor maps that reduce communication were used. The class of ordered transforms is expanded to include any FFT in which the order of the transform is the same as that of the input sequence. Two such orderings are examined, namely, standard-order and A-order which can be implemented with equal ease on the Connection Machine where orderings are determined by geometries and priorities. If the sequence has N = 2 exp r elements and the hypercube has P = 2 exp d processors, then a standard-order FFT can be implemented with d + r/2 + 1 parallel transmissions. An A-order sequence can be transformed with 2d - r/2 parallel transmissions which is r - d + 1 fewer than the standard order. A parallel method for computing the trigonometric coefficients is presented that does not use trigonometric functions or interprocessor communication. A performance of 0.9 GFLOPS was obtained for an A-order transform on the Connection Machine.
FTMP (Fault Tolerant Multiprocessor) programmer's manual

NASA Technical Reports Server (NTRS)

Feather, F. E.; Liceaga, C. A.; Padilla, P. A.

1986-01-01

The Fault Tolerant Multiprocessor (FTMP) computer system was constructed using the Rockwell/Collins CAPS-6 processor. It is installed in the Avionics Integration Research Laboratory (AIRLAB) of NASA Langley Research Center. It is hosted by AIRLAB's System 10, a VAX 11/750, for the loading of programs and experimentation. The FTMP support software includes a cross compiler for a high level language called Automated Engineering Design (AED) System, an assembler for the CAPS-6 processor assembly language, and a linker. Access to this support software is through an automated remote access facility on the VAX which relieves the user of the burden of learning how to use the IBM 4381. This manual is a compilation of information about the FTMP support environment. It explains the FTMP software and support environment along many of the finer points of running programs on FTMP. This will be helpful to the researcher trying to run an experiment on FTMP and even to the person probing FTMP with fault injections. Much of the information in this manual can be found in other sources; we are only attempting to bring together the basic points in a single source. If the reader should need points clarified, there is a list of support documentation in the back of this manual.
Parallel processing and expert systems

NASA Technical Reports Server (NTRS)

Lau, Sonie; Yan, Jerry C.

1991-01-01

Whether it be monitoring the thermal subsystem of Space Station Freedom, or controlling the navigation of the autonomous rover on Mars, NASA missions in the 1990s cannot enjoy an increased level of autonomy without the efficient implementation of expert systems. Merely increasing the computational speed of uniprocessors may not be able to guarantee that real-time demands are met for larger systems. Speedup via parallel processing must be pursued alongside the optimization of sequential implementations. Prototypes of parallel expert systems have been built at universities and industrial laboratories in the U.S. and Japan. The state-of-the-art research in progress related to parallel execution of expert systems is surveyed. The survey discusses multiprocessors for expert systems, parallel languages for symbolic computations, and mapping expert systems to multiprocessors. Results to date indicate that the parallelism achieved for these systems is small. The main reasons are (1) the body of knowledge applicable in any given situation and the amount of computation executed by each rule firing are small, (2) dividing the problem solving process into relatively independent partitions is difficult, and (3) implementation decisions that enable expert systems to be incrementally refined hamper compile-time optimization. In order to obtain greater speedups, data parallelism and application parallelism must be exploited.
Solutions and debugging for data consistency in multiprocessors with noncoherent caches

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bernstein, D.; Mendelson, B.; Breternitz, M. Jr.

1995-02-01

We analyze two important problems that arise in shared-memory multiprocessor systems. The stale data problem involves ensuring that data items in local memory of individual processors are current, independent of writes done by other processors. False sharing occurs when two processors have copies of the same shared data block but update different portions of the block. The false sharing problem involves guaranteeing that subsequent writes are properly combined. In modern architectures these problems are usually solved in hardware, by exploiting mechanisms for hardware controlled cache consistency. This leads to more expensive and nonscalable designs. Therefore, we are concentrating on softwaremore » methods for ensuring cache consistency that would allow for affordable and scalable multiprocessing systems. Unfortunately, providing software control is nontrivial, both for the compiler writer and for the application programmer. For this reason we are developing a debugging environment that will facilitate the development of compiler-based techniques and will help the programmer to tune his or her application using explicit cache management mechanisms. We extend the notion of a race condition for IBM Shared Memory System POWER/4, taking into consideration its noncoherent caches, and propose techniques for detection of false sharing problems. Identification of the stale data problem is discussed as well, and solutions are suggested.« less
MASC: Multiprocessor Architecture for Symbolic Processing

DTIC Science & Technology

1989-08-01

accomplish the ultimate goal of reducing the time to develop the 5- 12 correct parse. 5.6.5 Effect of’ Sentence Length Figure 5-6 shows the relationship...later; this effectively restricts the breadth of study rather severely. 6.1.2. Objective The tools described here have been developed in response to...either expressed or implied, of the Defense Advanced Research Projects Agency or the U.S. Government. ROME AIR DEVELOPMENT CENTER Air Force Systems
Communication-Driven Codesign for Multiprocessor Systems

DTIC Science & Technology

2004-01-01

processors, FPGA or ASIC subsystems, mi- croprocessors, and microcontrollers. When a processor is embedded within a SLOT architecture, one or more...Broderson, Low-power CMOS digital design, IEEE Journal of Solid-State Circuits 27 (1992), no. 4, 473–484. [25] L. Chao and E. Sha , Scheduling data-flow...1997), 239– 256 . [82] P. K. Murthy, E. G. Cohen, and S. Rowland, System Canvas: A new design en- vironment for embedded DSP and telecommunications
Processor tradeoffs in distributed real-time systems

NASA Technical Reports Server (NTRS)

Krishna, C. M.; Shin, Kang G.; Bhandari, Inderpal S.

1987-01-01

The problem of the optimization of the design of real-time distributed systems is examined with reference to a class of computer architectures similar to the continuously reconfigurable multiprocessor flight control system structure, CM2FCS. Particular attention is given to the impact of processor replacement and the burn-in time on the probability of dynamic failure and mean cost. The solution is obtained numerically and interpreted in the context of real-time applications.
Interface Message Processors for the ARPA Computer Network

DTIC Science & Technology

1976-07-01

and then clear the location) as its primitive locking facility (i.e., as the necessary multiprocessor lock equivalent to Dijkstra semaphores )[37]. To...of the extra storage required for the redundant copies. There is the problem of maintaining synchronization of multiple copy data bases in the presence...through any of the data base sites. I Update synchronization . Races between conflicting, "concurrent" update requests are resolved in a manner that j
The Refinement-Tree Partition for Parallel Solution of Partial Differential Equations

PubMed Central

Mitchell, William F.

1998-01-01

Dynamic load balancing is considered in the context of adaptive multilevel methods for partial differential equations on distributed memory multiprocessors. An approach that periodically repartitions the grid is taken. The important properties of a partitioning algorithm are presented and discussed in this context. A partitioning algorithm based on the refinement tree of the adaptive grid is presented and analyzed in terms of these properties. Theoretical and numerical results are given. PMID:28009355
The Refinement-Tree Partition for Parallel Solution of Partial Differential Equations.

PubMed

Mitchell, William F

1998-01-01

Dynamic load balancing is considered in the context of adaptive multilevel methods for partial differential equations on distributed memory multiprocessors. An approach that periodically repartitions the grid is taken. The important properties of a partitioning algorithm are presented and discussed in this context. A partitioning algorithm based on the refinement tree of the adaptive grid is presented and analyzed in terms of these properties. Theoretical and numerical results are given.
Integration, Development and Performance of the 500 TFLOPS Heterogeneous Cluster (Condor)

DTIC Science & Technology

2012-08-01

PlayStation 3 for High Performance Cluster Computing” LAPACK Working Note 185, 2007. [ 4 ] Feng, W., X. Feng, and R. Ge, “Green Supercomputing Comes of...CONFERENCE PAPER (Post Print) 3. DATES COVERED (From - To) JUN 2010 – MAY 2013 4 . TITLE AND SUBTITLE INTEGRATION, DEVELOPMENT AND PERFORMANCE OF...and streaming processing; the PlayStation 3 uses the IBM Cell BE processor, which adopts the multi-processor, single-instruction-multiple- data (SIMD
MIT Laboratory for Computer Science Progress Report, July 1984-June 1985

DTIC Science & Technology

1985-06-01

larger (up to several thousand machines) multiprocessor systems. This facility, funded by the newly formed Strategic Computing Program of the Defense...Szolovits, Group Leader R. Patil Collaborating Investigators M. Criscitiello, M.D., Tufts-New England Medical Center Hospital J. Dzierzanowski, Ph.D., Dept...COMPUTATION STRUCTURES Academic Staff J. B. Dennis, Group Leader Research Staff W. B. Ackerman G. A. Boughton W. Y-P. Lim Graduate Students T-A. Chu S
Laboratory for Computer Science Progress Report 21, July 1983-June 1984.

DTIC Science & Technology

1984-06-01

Systems 269 4. Distributed Consensus 270 5. Election of a Leader in a Distributed Ring of Processors 273 6. Distributed Network Algorithms 274 7. Diagnosis...multiprocessor systems. This facility, funded by the new!y formed Strategic Computing Program of the Defense Advanced Research Projects Agency, will enable...Academic Staff P. Szo)ovits, Group Leader R. Patil Collaborating Investigators M. Criscitiello, M.D., Tufts-New England Medical Center Hospital R
Real-time optical laboratory solution of parabolic differential equations

NASA Technical Reports Server (NTRS)

Casasent, David; Jackson, James

1988-01-01

An optical laboratory matrix-vector processor is used to solve parabolic differential equations (the transient diffusion equation with two space variables and time) by an explicit algorithm. This includes optical matrix-vector nonbase-2 encoded laboratory data, the combination of nonbase-2 and frequency-multiplexed data on such processors, a high-accuracy optical laboratory solution of a partial differential equation, new data partitioning techniques, and a discussion of a multiprocessor optical matrix-vector architecture.

A parallel algorithm for generation and assembly of finite element stiffness and mass matrices

NASA Technical Reports Server (NTRS)

Storaasli, O. O.; Carmona, E. A.; Nguyen, D. T.; Baddourah, M. A.

1991-01-01

A new algorithm is proposed for parallel generation and assembly of the finite element stiffness and mass matrices. The proposed assembly algorithm is based on a node-by-node approach rather than the more conventional element-by-element approach. The new algorithm's generality and computation speed-up when using multiple processors are demonstrated for several practical applications on multi-processor Cray Y-MP and Cray 2 supercomputers.
DARPA Status Report - November 1988

DTIC Science & Technology

1988-11-01

style used in the applic4#ons reference to that block was by processor j. where j It. We was influenced by it. MACH is a multiprocessor operating S call...it can be order they occurred. However. the exact time at which the treated specially in memory management , and so most of the reference wa, made is...on cache consistency performance, sophisti- peak can be explained as clinging references that occur when cated cache management schemes that take
The Design and Implementation of a Data Flow Multiprocessor.

DTIC Science & Technology

1981-12-01

to thank Captain Charles Papp who taught me how to use the logic analyzer and the storage oscilloscope. Without these tools, I could never have...debugged and repaired the microprocessors. Finally, I wish to thank my thesis readers, Major Charles Lillie and Major Walt Seward, for taking valuable time...Neumann/ Babbage architecture with the a data flow architecture. The next section describes the benefits of data flow computing. The following section
The FORCE: A highly portable parallel programming language

NASA Technical Reports Server (NTRS)

Jordan, Harry F.; Benten, Muhammad S.; Alaghband, Gita; Jakob, Ruediger

1989-01-01

Here, it is explained why the FORCE parallel programming language is easily portable among six different shared-memory microprocessors, and how a two-level macro preprocessor makes it possible to hide low level machine dependencies and to build machine-independent high level constructs on top of them. These FORCE constructs make it possible to write portable parallel programs largely independent of the number of processes and the specific shared memory multiprocessor executing them.
Two-Dimensional Signal Processing, Optical Information Storage and Processing, and Electromagnetic Measurements

DTIC Science & Technology

1992-07-15

Modeling SENIOR PRINCIPAL INVESTIGATOR: R. M. Mersereau, Regents’ Professor SCIENTIFIC PERSONNEL: F. J. Malassenet, (Ph.D. received, Dec. 1991) T. R. Gardos ...multiprocessor environment. RESEARCH ACCOMPLISHMENTS: Students Gardos , Martucci, Rao, and Truong were partially supported using funds supplied by Georgia...pp. 21- 56, 1991. [4] T. R. Gardos and R. M. Mersereau, "FIR Filtering on a Lattice with Periodically Deleted Samples," Proc. 1991 IEEE Int. Conf
Design and Implementation of Embedded Computer Vision Systems Based on Particle Filters

DTIC Science & Technology

2010-01-01

for hardware/software implementa- tion of multi-dimensional particle filter application and we explore this in the third application which is a 3D...methodology for hardware/software implementation of multi-dimensional particle filter application and we explore this in the third application which is a...and hence multiprocessor implementation of parti- cle filters is an important option to examine. A significant body of work exists on optimizing generic
Fast adaptive composite grid methods on distributed parallel architectures

NASA Technical Reports Server (NTRS)

Lemke, Max; Quinlan, Daniel

1992-01-01

The fast adaptive composite (FAC) grid method is compared with the adaptive composite method (AFAC) under variety of conditions including vectorization and parallelization. Results are given for distributed memory multiprocessor architectures (SUPRENUM, Intel iPSC/2 and iPSC/860). It is shown that the good performance of AFAC and its superiority over FAC in a parallel environment is a property of the algorithm and not dependent on peculiarities of any machine.
Hybrid Memory Management for Parallel Execution of Prolog on Shared Memory Multiprocessors

DTIC Science & Technology

1990-06-01

organizing data to increase locality. The stack structure exhibits greater locality than the heap structure. Tradeoff decisions can also be made on...PROGRAM ELEMENT NUMBER 6. AUTHOR(S) 5d. PROJECT NUMBER 5e. TASK NUMBER 5f. WORK UNIT NUMBER 7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES...University of California at Berkeley,Department of Electrical Engineering and Computer Sciences,Berkeley,CA,94720 8. PERFORMING ORGANIZATION REPORT
Communication Strategies for Shared-Bus Embedded Multiprocessors

DTIC Science & Technology

2005-09-01

target architecture [10]. We utilize the task execution model in [11], where each task vi in the task graph G = (V,E) is associated with three possible...predictability is therefore an interesting and important direction for further study. REFERENCES [1] T. Kogel, M. Doerper, A. Wieferink, R. Leupers, G ...Proceedings of Real-Time Technology and Applications Symposium, 1995, pp. 164–173. [11] S. Hua, G . Qu, and S. Bhattacharyya, “Energy reduction technique
The DASH Project: An Overview

DTIC Science & Technology

1988-02-29

by memory copyin g will degrade system performance on shared-memory multiprocessors. Virtual memor y (VM) remapping, as opposed to memory copying...Bershad, G.D. Giuseppe Facchetti, Kevin Fall, G . Scott Graham, Ellen Nelson , P. Venkat Rangan, Bruno Sartirana, Shin-Yuan Tzou, Raj Vaswani, and Robert...Remote Execution in NEST", IEEE Trans. on Software Eng. 13, 8 (August 1987), 905-912. 3. G . T. Almes, A. P. Black, E. Lazowska and J. Noe, "The Eden
Enabling Next-Generation Multicore Platforms in Embedded Applications

DTIC Science & Technology

2014-04-01

mapping to sets 129 − 256 ) to the second page in memory, color 2 (sets 257 − 384) to the third page, and so on. Then, after the 32nd page, all 212 sets...the Real-Time Nested Locking Protocol (RNLP) [56], a recently developed multiprocessor real-time locking protocol that optimally supports the...RELEASE; DISTRIBUTION UNLIMITED 15 In general, the problems of optimally assigning tasks to processors and colors to tasks are both NP-hard in the
Domain Decomposition: A Bridge between Nature and Parallel Computers

DTIC Science & Technology

1992-09-01

B., "Domain Decomposition Algorithms for Indefinite Elliptic Problems," S"IAM Journal of S; cientific and Statistical (’omputing, Vol. 13, 1992, pp...AD-A256 575 NASA Contractor Report 189709 ICASE Report No. 92-44 ICASE DOMAIN DECOMPOSITION: A BRIDGE BETWEEN NATURE AND PARALLEL COMPUTERS DTIC dE...effectively implemented on dis- tributed memory multiprocessors. In 1990 (as reported in Ref. 38 using the tile algo- rithm), a 103,201-unknown 2D elliptic
A CRAY-Class Multiprocessor Simulator.

DTIC Science & Technology

1983-09-01

instruction will appear in this column. 38 ! ?m L < ¢ < . - ?. .., ..i.-..--. .°.- ’’ , ... .,. ... 18. BCG -The parcel buffer change flag. When the next...program is first started up, no instruction parcels are in the parcel buffers so a fetch is required. The asterisk in the BCG field indicates a buffer...an in-buffer target address. While the buffer change is in progress, the instruction causing the change will place its tag in the BCG column. Once the
Investigation into the Use of Texturing for Real-Time Computer Animation.

DTIC Science & Technology

1987-12-01

produce a rough polygon surface [7]. Research in the area of real time texturing has also been conducted. Using a specially designed multi-processor system ...Oka, Tsutsui, Ohba, Kurauchi and Tago have introduced real-time manipulation of texture mapped surfaces [8]. Using multi- processors, systems will...a call to the system function defpattern(n,size,mask) short n,size; short *mask, which takes as input an index to a system table of patterns, a
Software-Controlled Caches in the VMP Multiprocessor

DTIC Science & Technology

1986-03-01

programming system level that Processors is tuned for the VMP design. In this vein, we are interested in exploring how far the software support can go to ...handled in software, analogously to the handling agement of the shared program state is familiar and of virtual memory page faults. Hardware support for...ensure good behavior, as opposed to how Each cache miss results in bus traffic. Table 2 pro- vides the bus cost for the "average" cache miss. Fig
Orbit Software Suite

NASA Technical Reports Server (NTRS)

Osgood, Cathy; Williams, Kevin; Gentry, Philip; Brownfield, Dana; Hallstrom, John; Stuit, Tim

2012-01-01

Orbit Software Suite is used to support a variety of NASA/DM (Dependable Multiprocessor) mission planning and analysis activities on the IPS (Intrusion Prevention System) platform. The suite of Orbit software tools (Orbit Design and Orbit Dynamics) resides on IPS/Linux workstations, and is used to perform mission design and analysis tasks corresponding to trajectory/ launch window, rendezvous, and proximity operations flight segments. A list of tools in Orbit Software Suite represents tool versions established during/after the Equipment Rehost-3 Project.
Dynamic grid refinement for partial differential equations on parallel computers

NASA Technical Reports Server (NTRS)

Mccormick, S.; Quinlan, D.

1989-01-01

The fast adaptive composite grid method (FAC) is an algorithm that uses various levels of uniform grids to provide adaptive resolution and fast solution of PDEs. An asynchronous version of FAC, called AFAC, that completely eliminates the bottleneck to parallelism is presented. This paper describes the advantage that this algorithm has in adaptive refinement for moving singularities on multiprocessor computers. This work is applicable to the parallel solution of two- and three-dimensional shock tracking problems.
Parallel-Processing Test Bed For Simulation Software

NASA Technical Reports Server (NTRS)

Blech, Richard; Cole, Gary; Townsend, Scott

1996-01-01

Second-generation Hypercluster computing system is multiprocessor test bed for research on parallel algorithms for simulation in fluid dynamics, electromagnetics, chemistry, and other fields with large computational requirements but relatively low input/output requirements. Built from standard, off-shelf hardware readily upgraded as improved technology becomes available. System used for experiments with such parallel-processing concepts as message-passing algorithms, debugging software tools, and computational steering. First-generation Hypercluster system described in "Hypercluster Parallel Processor" (LEW-15283).
Systematic generation of multibody equations of motion suitable for recursive and parallel manipulation

NASA Technical Reports Server (NTRS)

Nikravesh, Parviz E.; Gim, Gwanghum; Arabyan, Ara; Rein, Udo

1989-01-01

The formulation of a method known as the joint coordinate method for automatic generation of the equations of motion for multibody systems is summarized. For systems containing open or closed kinematic loops, the equations of motion can be reduced systematically to a minimum number of second order differential equations. The application of recursive and nonrecursive algorithms to this formulation, computational considerations and the feasibility of implementing this formulation on multiprocessor computers are discussed.
Cache Coherence Protocols for Large-Scale Multiprocessors

DTIC Science & Technology

1990-09-01

and is compared with the other protocols for large-scale machines. In later analysis, this coherence method is designated by the acronym OCPD , which...private read misses 2 6 6 ( OCPD ) private write misses 2 6 6 Table 4.2: Transaction Types and Costs. the performance of the memory system. These...methodologies. Figure 4-2 shows the processor utiliza- tions of the Weather program, with special code in the dyn-nic post-mortem sched- 94 OCPD DlrINB

Fault tolerance in a supercomputer through dynamic repartitioning

DOEpatents

Chen, Dong; Coteus, Paul W.; Gara, Alan G.; Takken, Todd E.

2007-02-27

A multiprocessor, parallel computer is made tolerant to hardware failures by providing extra groups of redundant standby processors and by designing the system so that these extra groups of processors can be swapped with any group which experiences a hardware failure. This swapping can be under software control, thereby permitting the entire computer to sustain a hardware failure but, after swapping in the standby processors, to still appear to software as a pristine, fully functioning system.
Algorithms and software for solving finite element equations on serial and parallel architectures

NASA Technical Reports Server (NTRS)

George, Alan

1989-01-01

Over the past 15 years numerous new techniques have been developed for solving systems of equations and eigenvalue problems arising in finite element computations. A package called SPARSPAK has been developed by the author and his co-workers which exploits these new methods. The broad objective of this research project is to incorporate some of this software in the Computational Structural Mechanics (CSM) testbed, and to extend the techniques for use on multiprocessor architectures.
Research in the Aloha system

NASA Technical Reports Server (NTRS)

Abramson, N.

1974-01-01

The Aloha system was studied and developed and extended to advanced forms of computer communications networks. Theoretical and simulation studies of Aloha type radio channels for use in packet switched communications networks were performed. Improved versions of the Aloha communications techniques and their extensions were tested experimentally. A packet radio repeater suitable for use with the Aloha system operational network was developed. General studies of the organization of multiprocessor systems centered on the development of the BCC 500 computer were concluded.
A Study of Data Compression and Efficient Memory Utilization in Multiprocessor and Distributed Systems.

DTIC Science & Technology

1980-08-31

DEPARTMENT OF THE ARMY POSITION, UNLESS SO DESIGNATED BY OTHER AUTHORIZED DOCUMENTS. -I , ! unclassified SECURITY CLASSIICATION Of THIS PAGE ’Whm bate...ADORE.SE4I dFfemtar CIatblftaE Office) IS. SECURITY CLASS. (of ftio #sNt) unclassified IS. DOCL ASSI FICATION/DOWNGRADING SCHEDULENA INA IS. DISTRIBUTION...In this section, we describe the characteristics of the access sequence of a pipelined processor. A pipelined organization in the most general sense
Benchmarking Ada tasking on tightly coupled multiprocessor architectures

NASA Technical Reports Server (NTRS)

Collard, Philippe; Goforth, Andre; Marquardt, Matthew

1989-01-01

The development of benchmarks and performance measures for parallel Ada tasking is reported with emphasis on the macroscopic behavior of the benchmark across a set of load parameters. The application chosen for the study was the NASREM model for telerobot control, relevant to many NASA missions. The results of the study demonstrate the potential of parallel Ada in accomplishing the task of developing a control system for a system such as the Flight Telerobotic Servicer using the NASREM framework.
Experiments with a Knowledge-Based System on a Multiprocessor

DTIC Science & Technology

1987-10-01

variables on the node. Functions have access to those instance variables. Gajski et. al. [ Gajski 82] summarize the principles underlying pure data flow...ConSider the following numerical example from Gajski et. al. ( Gajski 82]. The pseudo-code representation of the problem is as follows: co - 0 A=x £ 1 Ixo... Gajski et. al. They assume that division takes three processing units, multiplication takes two units, and addition takes one unit. As noted in their paper
Multiprocessor graphics computation and display using transputers

NASA Technical Reports Server (NTRS)

Ellis, Graham K.

1988-01-01

A package of two-dimensional graphics routines was developed to run on a transputer-based parallel processing system. These routines were designed to enable applications programmers to easily generate and display results from the transputer network in a graphic format. The graphics procedures were designed for the lowest possible network communication overhead for increased performance. The routines were designed for ease of use and to present an intuitive approach to generating graphics on the transputer parallel processing system.
Multilevel Parallelization of AutoDock 4.2.

PubMed

Norgan, Andrew P; Coffman, Paul K; Kocher, Jean-Pierre A; Katzmann, David J; Sosa, Carlos P

2011-04-28

Virtual (computational) screening is an increasingly important tool for drug discovery. AutoDock is a popular open-source application for performing molecular docking, the prediction of ligand-receptor interactions. AutoDock is a serial application, though several previous efforts have parallelized various aspects of the program. In this paper, we report on a multi-level parallelization of AutoDock 4.2 (mpAD4). Using MPI and OpenMP, AutoDock 4.2 was parallelized for use on MPI-enabled systems and to multithread the execution of individual docking jobs. In addition, code was implemented to reduce input/output (I/O) traffic by reusing grid maps at each node from docking to docking. Performance of mpAD4 was examined on two multiprocessor computers. Using MPI with OpenMP multithreading, mpAD4 scales with near linearity on the multiprocessor systems tested. In situations where I/O is limiting, reuse of grid maps reduces both system I/O and overall screening time. Multithreading of AutoDock's Lamarkian Genetic Algorithm with OpenMP increases the speed of execution of individual docking jobs, and when combined with MPI parallelization can significantly reduce the execution time of virtual screens. This work is significant in that mpAD4 speeds the execution of certain molecular docking workloads and allows the user to optimize the degree of system-level (MPI) and node-level (OpenMP) parallelization to best fit both workloads and computational resources.
Design and implementation of highly parallel pipelined VLSI systems

NASA Astrophysics Data System (ADS)

Delange, Alphonsus Anthonius Jozef

A methodology and its realization as a prototype CAD (Computer Aided Design) system for the design and analysis of complex multiprocessor systems is presented. The design is an iterative process in which the behavioral specifications of the system components are refined into structural descriptions consisting of interconnections and lower level components etc. A model for the representation and analysis of multiprocessor systems at several levels of abstraction and an implementation of a CAD system based on this model are described. A high level design language, an object oriented development kit for tool design, a design data management system, and design and analysis tools such as a high level simulator and graphics design interface which are integrated into the prototype system and graphics interface are described. Procedures for the synthesis of semiregular processor arrays, and to compute the switching of input/output signals, memory management and control of processor array, and sequencing and segmentation of input/output data streams due to partitioning and clustering of the processor array during the subsequent synthesis steps, are described. The architecture and control of a parallel system is designed and each component mapped to a module or module generator in a symbolic layout library, compacted for design rules of VLSI (Very Large Scale Integration) technology. An example of the design of a processor that is a useful building block for highly parallel pipelined systems in the signal/image processing domains is given.
Parallelization of a Monte Carlo particle transport simulation code

NASA Astrophysics Data System (ADS)

Hadjidoukas, P.; Bousis, C.; Emfietzoglou, D.

2010-05-01

We have developed a high performance version of the Monte Carlo particle transport simulation code MC4. The original application code, developed in Visual Basic for Applications (VBA) for Microsoft Excel, was first rewritten in the C programming language for improving code portability. Several pseudo-random number generators have been also integrated and studied. The new MC4 version was then parallelized for shared and distributed-memory multiprocessor systems using the Message Passing Interface. Two parallel pseudo-random number generator libraries (SPRNG and DCMT) have been seamlessly integrated. The performance speedup of parallel MC4 has been studied on a variety of parallel computing architectures including an Intel Xeon server with 4 dual-core processors, a Sun cluster consisting of 16 nodes of 2 dual-core AMD Opteron processors and a 200 dual-processor HP cluster. For large problem size, which is limited only by the physical memory of the multiprocessor server, the speedup results are almost linear on all systems. We have validated the parallel implementation against the serial VBA and C implementations using the same random number generator. Our experimental results on the transport and energy loss of electrons in a water medium show that the serial and parallel codes are equivalent in accuracy. The present improvements allow for studying of higher particle energies with the use of more accurate physical models, and improve statistics as more particles tracks can be simulated in low response time.
Nonpreemptive run-time scheduling issues on a multitasked, multiprogrammed multiprocessor with dependencies, bidimensional tasks, folding and dynamic graphs

DOE Office of Scientific and Technical Information (OSTI.GOV)

Miller, Allan Ray

1987-05-01

Increases in high speed hardware have mandated studies in software techniques to exploit the parallel capabilities. This thesis examines the effects a run-time scheduler has on a multiprocessor. The model consists of directed, acyclic graphs, generated from serial FORTRAN benchmark programs by the parallel compiler Parafrase. A multitasked, multiprogrammed environment is created. Dependencies are generated by the compiler. Tasks are bidimensional, i.e., they may specify both time and processor requests. Processor requests may be folded into execution time by the scheduler. The graphs may arrive at arbitrary time intervals. The general case is NP-hard, thus, a variety of heuristics aremore » examined by a simulator. Multiprogramming demonstrates a greater need for a run-time scheduler than does monoprogramming for a variety of reasons, e.g., greater stress on the processors, a larger number of independent control paths, more variety in the task parameters, etc. The dynamic critical path series of algorithms perform well. Dynamic critical volume did not add much. Unfortunately, dynamic critical path maximizes turnaround time as well as throughput. Two schedulers are presented which balance throughput and turnaround time. The first requires classification of jobs by type; the second requires selection of a ratio value which is dependent upon system parameters. 45 refs., 19 figs., 20 tabs.« less
Self-powered information measuring wireless networks using the distribution of tasks within multicore processors

NASA Astrophysics Data System (ADS)

Zhuravska, Iryna M.; Koretska, Oleksandra O.; Musiyenko, Maksym P.; Surtel, Wojciech; Assembay, Azat; Kovalev, Vladimir; Tleshova, Akmaral

2017-08-01

The article contains basic approaches to develop the self-powered information measuring wireless networks (SPIM-WN) using the distribution of tasks within multicore processors critical applying based on the interaction of movable components - as in the direction of data transmission as wireless transfer of energy coming from polymetric sensors. Base mathematic model of scheduling tasks within multiprocessor systems was modernized to schedule and allocate tasks between cores of one-crystal computer (SoC) to increase energy efficiency SPIM-WN objects.
Multiprocessor Z-Buffer Architecture for High-Speed, High Complexity Computer Image Generation.

DTIC Science & Technology

1983-12-01

Oversampling 50 17. "Poking Through" Effects 51 18. Sampling Paths 52 19. Triangle Variables 54 20. Intelligent Tiling Algorithm 61 21. Tiler Functional Blocks...64 * 22. HSD Interface 65 23. Tiling Machine Setup 67 24. Tiling Machine 68 25. Tile Accumulate 69 26. A lx$ Sorting Machine 77 27. A 2x8 Sorting...Delay 227 87. Effect of Triangle Size on Tiler Throughput Rates 229 88. Tiling Machine Setup Stage Performance for Oversample Mode 234 89. Tiling
A Linked-Cell Domain Decomposition Method for Molecular Dynamics Simulation on a Scalable Multiprocessor

DOE PAGES

Yang, L. H.; Brooks III, E. D.; Belak, J.

1992-01-01

A molecular dynamics algorithm for performing large-scale simulations using the Parallel C Preprocessor (PCP) programming paradigm on the BBN TC2000, a massively parallel computer, is discussed. The algorithm uses a linked-cell data structure to obtain the near neighbors of each atom as time evoles. Each processor is assigned to a geometric domain containing many subcells and the storage for that domain is private to the processor. Within this scheme, the interdomain (i.e., interprocessor) communication is minimized.
Research on blackboard architectures at the Heuristic Programming Project (HPP)

NASA Technical Reports Server (NTRS)

Nii, H. Penny

1985-01-01

Researchers are entering the second decade of research in the Blackboard problem solving framework with focus in the following areas: (1) extensions to the basic concepts implemented in AGE-1 to address, for example, reasoning with uncertain data; (2) a new architecture and development environment, BB1, that implements methods for explicity controlling the reasoning; and (3) the design of and experimentation with multiprocessor architectures using the Blackboard as an organizing framework. A summary of these efforts is presented.
Simulation Analysis of Data Sharing in Shared Memory Multiprocessors

DTIC Science & Technology

1989-02-24

LIMITATION OF ABSTRACT Same as Report (SAR) 18. NUMBER OF PAGES 178 19a. NAME OF RESPONSIBLE PERSON a. REPORT unclassified b . ABSTRACT unclassified...work. Andrea Casotto (CELL), Steve McGrogan (SPICE), Srinivas Devadas (TOPOP1) and Hi-Keung Tony Ma (VERIFY) donated the parallel programs and a con...Effect of Block Size on B us Utilization 120 5-14 Ratio of Sharing Bus Cyc les to Total Bus Cycles 120 5-15 Oassification of Bus Cyc les for
Object classification for obstacle avoidance

NASA Astrophysics Data System (ADS)

Regensburger, Uwe; Graefe, Volker

1991-03-01

Object recognition is necessary for any mobile robot operating autonomously in the real world. This paper discusses an object classifier based on a 2-D object model. Obstacle candidates are tracked and analyzed false alarms generated by the object detector are recognized and rejected. The methods have been implemented on a multi-processor system and tested in real-world experiments. They work reliably under favorable conditions but sometimes problems occur e. g. when objects contain many features (edges) or move in front of structured background.
Expert Systems on Multiprocessor Architectures. Volume 3. Technical Reports

DTIC Science & Technology

1991-06-01

choice of load balancing vs. load sharing 1141. While load balancing strives to keep all sites equally loaded, load sharing merely tries to prevent ...unnecessary idleness. Loo. balancing is appropriate to object- oriented real- time systems because * real-time systems ne ,l to prevent long waits for...oetavir ConClass siy51cr Iz a n ubjeU rephitation ’-enare ir order wo prevent a partic=Lar abiec:;ram heing (ntrlu ~lel Ar iic]en:f etautaan ire chanw
Hierarchically Parallelized Constrained Nonlinear Solvers with Automated Substructuring

NASA Technical Reports Server (NTRS)

Padovan, Joe; Kwang, Abel

1994-01-01

This paper develops a parallelizable multilevel multiple constrained nonlinear equation solver. The substructuring process is automated to yield appropriately balanced partitioning of each succeeding level. Due to the generality of the procedure,_sequential, as well as partially and fully parallel environments can be handled. This includes both single and multiprocessor assignment per individual partition. Several benchmark examples are presented. These illustrate the robustness of the procedure as well as its capability to yield significant reductions in memory utilization and calculational effort due both to updating and inversion.
The Effects of Block Size on the Performance of Coherent Caches in Shared-Memory Multiprocessors

DTIC Science & Technology

1993-05-01

increase with the bandwidth and latency. For those applications with poor spatial locality, the best choice of cache line size is determined by the...observation was used in the design of two schemes: LimitLESS di- rectories and Tag caches. LimitLESS directories [15] were designed for the ALEWIFE...small packets may be used to avoid network congestion. The most important factor influencing the choice of cache line size for a multipro- cessor is the

Integrating Software Modules For Robot Control

NASA Technical Reports Server (NTRS)

Volpe, Richard A.; Khosla, Pradeep; Stewart, David B.

1993-01-01

Reconfigurable, sensor-based control system uses state variables in systematic integration of reusable control modules. Designed for open-architecture hardware including many general-purpose microprocessors, each having own local memory plus access to global shared memory. Implemented in software as extension of Chimera II real-time operating system. Provides transparent computing mechanism for intertask communication between control modules and generic process-module architecture for multiprocessor realtime computation. Used to control robot arm. Proves useful in variety of other control and robotic applications.
Utilizing Dynamically Coupled Cores to Form a Resilient Chip Multiprocessor

DTIC Science & Technology

2007-06-01

requires a significant deviation from previous work. For instance, we find that using the relaxed input replication model from Reunion incurs a...Circuit Width Delay Count CRC-16 16 6.65 754 CRC- SDLC -16 16 6.10 888 CRC-32 16 7.28 2260 CRC-32 32 8.60 4240 Table 1. FO4 delay and transistor count for...the operation of our proposed system is the same in all other respects. 4.4 Compatibility Across Memory Consis- tency Models The memory consistency
A model for the distributed storage and processing of large arrays

NASA Technical Reports Server (NTRS)

Mehrota, P.; Pratt, T. W.

1983-01-01

A conceptual model for parallel computations on large arrays is developed. The model provides a set of language concepts appropriate for processing arrays which are generally too large to fit in the primary memories of a multiprocessor system. The semantic model is used to represent arrays on a concurrent architecture in such a way that the performance realities inherent in the distributed storage and processing can be adequately represented. An implementation of the large array concept as an Ada package is also described.
Distributed optimization system and method

DOEpatents

Hurtado, John E.; Dohrmann, Clark R.; Robinett, III, Rush D.

2003-06-10

A search system and method for controlling multiple agents to optimize an objective using distributed sensing and cooperative control. The search agent can be one or more physical agents, such as a robot, and can be software agents for searching cyberspace. The objective can be: chemical sources, temperature sources, radiation sources, light sources, evaders, trespassers, explosive sources, time dependent sources, time independent sources, function surfaces, maximization points, minimization points, and optimal control of a system such as a communication system, an economy, a crane, and a multi-processor computer.
Distributed Optimization System

DOEpatents

Hurtado, John E.; Dohrmann, Clark R.; Robinett, III, Rush D.

2004-11-30

A search system and method for controlling multiple agents to optimize an objective using distributed sensing and cooperative control. The search agent can be one or more physical agents, such as a robot, and can be software agents for searching cyberspace. The objective can be: chemical sources, temperature sources, radiation sources, light sources, evaders, trespassers, explosive sources, time dependent sources, time independent sources, function surfaces, maximization points, minimization points, and optimal control of a system such as a communication system, an economy, a crane, and a multi-processor computer.
Evaluation of SuperLU on multicore architectures

NASA Astrophysics Data System (ADS)

Li, X. S.

2008-07-01

The Chip Multiprocessor (CMP) will be the basic building block for computer systems ranging from laptops to supercomputers. New software developments at all levels are needed to fully utilize these systems. In this work, we evaluate performance of different high-performance sparse LU factorization and triangular solution algorithms on several representative multicore machines. We included both Pthreads and MPI implementations in this study and found that the Pthreads implementation consistently delivers good performance and that a left-looking algorithm is usually superior.
Parallel algorithms for simulating continuous time Markov chains

NASA Technical Reports Server (NTRS)

Nicol, David M.; Heidelberger, Philip

1992-01-01

We have previously shown that the mathematical technique of uniformization can serve as the basis of synchronization for the parallel simulation of continuous-time Markov chains. This paper reviews the basic method and compares five different methods based on uniformization, evaluating their strengths and weaknesses as a function of problem characteristics. The methods vary in their use of optimism, logical aggregation, communication management, and adaptivity. Performance evaluation is conducted on the Intel Touchstone Delta multiprocessor, using up to 256 processors.
Macro-actor execution on multilevel data-driven architectures

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gaudiot, J.L.; Najjar, W.

1988-12-31

The data-flow model of computation brings to multiprocessors high programmability at the expense of increased overhead. Applying the model at a higher level leads to better performance but also introduces loss of parallelism. We demonstrate here syntax directed program decomposition methods for the creation of large macro-actors in numerical algorithms. In order to alleviate some of the problems introduced by the lower resolution interpretation, we describe a multi-level of resolution and analyze the requirements for its actual hardware and software integration.
A Multiprocessor Implementation of CSP (Communicating Sequential Processes)

DTIC Science & Technology

1988-03-01

P to check that "valid" communications can take place between P using guard g , and P,, and if so, to attempt to commit to P,. If a commit was...AltList,, gi): INTEGER that scans the remote alternative list AltList, looking for a matching and corn- patible guard g , to the local guard g ,. By...matching we mean gj contains an I/O operation with P. By compatible we mean g , and gj do not both contain input (output) commands. CheckGuard returns j
Multiprocessor architecture: Synthesis and evaluation

NASA Technical Reports Server (NTRS)

Standley, Hilda M.

1990-01-01

Multiprocessor computed architecture evaluation for structural computations is the focus of the research effort described. Results obtained are expected to lead to more efficient use of existing architectures and to suggest designs for new, application specific, architectures. The brief descriptions given outline a number of related efforts directed toward this purpose. The difficulty is analyzing an existing architecture or in designing a new computer architecture lies in the fact that the performance of a particular architecture, within the context of a given application, is determined by a number of factors. These include, but are not limited to, the efficiency of the computation algorithm, the programming language and support environment, the quality of the program written in the programming language, the multiplicity of the processing elements, the characteristics of the individual processing elements, the interconnection network connecting processors and non-local memories, and the shared memory organization covering the spectrum from no shared memory (all local memory) to one global access memory. These performance determiners may be loosely classified as being software or hardware related. This distinction is not clear or even appropriate in many cases. The effect of the choice of algorithm is ignored by assuming that the algorithm is specified as given. Effort directed toward the removal of the effect of the programming language and program resulted in the design of a high-level parallel programming language. Two characteristics of the fundamental structure of the architecture (memory organization and interconnection network) are examined.
Multi-processor including data flow accelerator module

DOEpatents

Davidson, George S.; Pierce, Paul E.

1990-01-01

An accelerator module for a data flow computer includes an intelligent memory. The module is added to a multiprocessor arrangement and uses a shared tagged memory architecture in the data flow computer. The intelligent memory module assigns locations for holding data values in correspondence with arcs leading to a node in a data dependency graph. Each primitive computation is associated with a corresponding memory cell, including a number of slots for operands needed to execute a primitive computation, a primitive identifying pointer, and linking slots for distributing the result of the cell computation to other cells requiring that result as an operand. Circuitry is provided for utilizing tag bits to determine automatically when all operands required by a processor are available and for scheduling the primitive for execution in a queue. Each memory cell of the module may be associated with any of the primitives, and the particular primitive to be executed by the processor associated with the cell is identified by providing an index, such as the cell number for the primitive, to the primitive lookup table of starting addresses. The module thus serves to perform functions previously performed by a number of sections of data flow architectures and coexists with conventional shared memory therein. A multiprocessing system including the module operates in a hybrid mode, wherein the same processing modules are used to perform some processing in a sequential mode, under immediate control of an operating system, while performing other processing in a data flow mode.
Parallel processing for scientific computations

NASA Technical Reports Server (NTRS)

Alkhatib, Hasan S.

1995-01-01

The scope of this project dealt with the investigation of the requirements to support distributed computing of scientific computations over a cluster of cooperative workstations. Various experiments on computations for the solution of simultaneous linear equations were performed in the early phase of the project to gain experience in the general nature and requirements of scientific applications. A specification of a distributed integrated computing environment, DICE, based on a distributed shared memory communication paradigm has been developed and evaluated. The distributed shared memory model facilitates porting existing parallel algorithms that have been designed for shared memory multiprocessor systems to the new environment. The potential of this new environment is to provide supercomputing capability through the utilization of the aggregate power of workstations cooperating in a cluster interconnected via a local area network. Workstations, generally, do not have the computing power to tackle complex scientific applications, making them primarily useful for visualization, data reduction, and filtering as far as complex scientific applications are concerned. There is a tremendous amount of computing power that is left unused in a network of workstations. Very often a workstation is simply sitting idle on a desk. A set of tools can be developed to take advantage of this potential computing power to create a platform suitable for large scientific computations. The integration of several workstations into a logical cluster of distributed, cooperative, computing stations presents an alternative to shared memory multiprocessor systems. In this project we designed and evaluated such a system.
Data traffic reduction schemes for Cholesky factorization on asynchronous multiprocessor systems

NASA Technical Reports Server (NTRS)

Naik, Vijay K.; Patrick, Merrell L.

1989-01-01

Communication requirements of Cholesky factorization of dense and sparse symmetric, positive definite matrices are analyzed. The communication requirement is characterized by the data traffic generated on multiprocessor systems with local and shared memory. Lower bound proofs are given to show that when the load is uniformly distributed the data traffic associated with factoring an n x n dense matrix using n to the alpha power (alpha less than or equal 2) processors is omega(n to the 2 + alpha/2 power). For n x n sparse matrices representing a square root of n x square root of n regular grid graph the data traffic is shown to be omega(n to the 1 + alpha/2 power), alpha less than or equal 1. Partitioning schemes that are variations of block assignment scheme are described and it is shown that the data traffic generated by these schemes are asymptotically optimal. The schemes allow efficient use of up to O(n to the 2nd power) processors in the dense case and up to O(n) processors in the sparse case before the total data traffic reaches the maximum value of O(n to the 3rd power) and O(n to the 3/2 power), respectively. It is shown that the block based partitioning schemes allow a better utilization of the data accessed from shared memory and thus reduce the data traffic than those based on column-wise wrap around assignment schemes.
Computational design of RNA parts, devices, and transcripts with kinetic folding algorithms implemented on multiprocessor clusters.

PubMed

Thimmaiah, Tim; Voje, William E; Carothers, James M

2015-01-01

With progress toward inexpensive, large-scale DNA assembly, the demand for simulation tools that allow the rapid construction of synthetic biological devices with predictable behaviors continues to increase. By combining engineered transcript components, such as ribosome binding sites, transcriptional terminators, ligand-binding aptamers, catalytic ribozymes, and aptamer-controlled ribozymes (aptazymes), gene expression in bacteria can be fine-tuned, with many corollaries and applications in yeast and mammalian cells. The successful design of genetic constructs that implement these kinds of RNA-based control mechanisms requires modeling and analyzing kinetically determined co-transcriptional folding pathways. Transcript design methods using stochastic kinetic folding simulations to search spacer sequence libraries for motifs enabling the assembly of RNA component parts into static ribozyme- and dynamic aptazyme-regulated expression devices with quantitatively predictable functions (rREDs and aREDs, respectively) have been described (Carothers et al., Science 334:1716-1719, 2011). Here, we provide a detailed practical procedure for computational transcript design by illustrating a high throughput, multiprocessor approach for evaluating spacer sequences and generating functional rREDs. This chapter is written as a tutorial, complete with pseudo-code and step-by-step instructions for setting up a computational cluster with an Amazon, Inc. web server and performing the large numbers of kinefold-based stochastic kinetic co-transcriptional folding simulations needed to design functional rREDs and aREDs. The method described here should be broadly applicable for designing and analyzing a variety of synthetic RNA parts, devices and transcripts.
Switch for serial or parallel communication networks

DOEpatents

Crosette, D.B.

1994-07-19

A communication switch apparatus and a method for use in a geographically extensive serial, parallel or hybrid communication network linking a multi-processor or parallel processing system has a very low software processing overhead in order to accommodate random burst of high density data. Associated with each processor is a communication switch. A data source and a data destination, a sensor suite or robot for example, may also be associated with a switch. The configuration of the switches in the network are coordinated through a master processor node and depends on the operational phase of the multi-processor network: data acquisition, data processing, and data exchange. The master processor node passes information on the state to be assumed by each switch to the processor node associated with the switch. The processor node then operates a series of multi-state switches internal to each communication switch. The communication switch does not parse and interpret communication protocol and message routing information. During a data acquisition phase, the communication switch couples sensors producing data to the processor node associated with the switch, to a downlink destination on the communications network, or to both. It also may couple an uplink data source to its processor node. During the data exchange phase, the switch couples its processor node or an uplink data source to a downlink destination (which may include a processor node or a robot), or couples an uplink source to its processor node and its processor node to a downlink destination. 9 figs.
Switch for serial or parallel communication networks

DOEpatents

Crosette, Dario B.

1994-01-01

A communication switch apparatus and a method for use in a geographically extensive serial, parallel or hybrid communication network linking a multi-processor or parallel processing system has a very low software processing overhead in order to accommodate random burst of high density data. Associated with each processor is a communication switch. A data source and a data destination, a sensor suite or robot for example, may also be associated with a switch. The configuration of the switches in the network are coordinated through a master processor node and depends on the operational phase of the multi-processor network: data acquisition, data processing, and data exchange. The master processor node passes information on the state to be assumed by each switch to the processor node associated with the switch. The processor node then operates a series of multi-state switches internal to each communication switch. The communication switch does not parse and interpret communication protocol and message routing information. During a data acquisition phase, the communication switch couples sensors producing data to the processor node associated with the switch, to a downlink destination on the communications network, or to both. It also may couple an uplink data source to its processor node. During the data exchange phase, the switch couples its processor node or an uplink data source to a downlink destination (which may include a processor node or a robot), or couples an uplink source to its processor node and its processor node to a downlink destination.
Exploring the use of I/O nodes for computation in a MIMD multiprocessor

NASA Technical Reports Server (NTRS)

Kotz, David; Cai, Ting

1995-01-01

As parallel systems move into the production scientific-computing world, the emphasis will be on cost-effective solutions that provide high throughput for a mix of applications. Cost effective solutions demand that a system make effective use of all of its resources. Many MIMD multiprocessors today, however, distinguish between 'compute' and 'I/O' nodes, the latter having attached disks and being dedicated to running the file-system server. This static division of responsibilities simplifies system management but does not necessarily lead to the best performance in workloads that need a different balance of computation and I/O. Of course, computational processes sharing a node with a file-system service may receive less CPU time, network bandwidth, and memory bandwidth than they would on a computation-only node. In this paper we begin to examine this issue experimentally. We found that high performance I/O does not necessarily require substantial CPU time, leaving plenty of time for application computation. There were some complex file-system requests, however, which left little CPU time available to the application. (The impact on network and memory bandwidth still needs to be determined.) For applications (or users) that cannot tolerate an occasional interruption, we recommend that they continue to use only compute nodes. For tolerant applications needing more cycles than those provided by the compute nodes, we recommend that they take full advantage of both compute and I/O nodes for computation, and that operating systems should make this possible.
Vienna FORTRAN: A FORTRAN language extension for distributed memory multiprocessors

NASA Technical Reports Server (NTRS)

Chapman, Barbara; Mehrotra, Piyush; Zima, Hans

1991-01-01

Exploiting the performance potential of distributed memory machines requires a careful distribution of data across the processors. Vienna FORTRAN is a language extension of FORTRAN which provides the user with a wide range of facilities for such mapping of data structures. However, programs in Vienna FORTRAN are written using global data references. Thus, the user has the advantage of a shared memory programming paradigm while explicitly controlling the placement of data. The basic features of Vienna FORTRAN are presented along with a set of examples illustrating the use of these features.
Method of predicting a change in an economy

DOEpatents

Pryor, Richard J [Albuquerque, NM; Basu, Nipa [Albany, NY

2006-01-10

An economy whose activity is to be predicted comprises a plurality of decision makers. Decision makers include, for example, households, government, industry, and banks. The decision makers are represented by agents, where an agent can represent one or more decision makers. Each agent has decision rules that determine the agent's actions. Each agent can affect the economy by affecting variable conditions characteristic of the economy or the internal state of other agents. Agents can communicate actions through messages. On a multiprocessor computer, the agents can be assigned to processing elements.
Models for evaluating the performability of degradable computing systems

NASA Technical Reports Server (NTRS)

Wu, L. T.

1982-01-01

Recent advances in multiprocessor technology established the need for unified methods to evaluate computing systems performance and reliability. In response to this modeling need, a general modeling framework that permits the modeling, analysis and evaluation of degradable computing systems is considered. Within this framework, several user oriented performance variables are identified and shown to be proper generalizations of the traditional notions of system performance and reliability. Furthermore, a time varying version of the model is developed to generalize the traditional fault tree reliability evaluation methods of phased missions.

Customizing FP-growth algorithm to parallel mining with Charm++ library

NASA Astrophysics Data System (ADS)

Puścian, Marek

2017-08-01

This paper presents a frequent item mining algorithm that was customized to handle growing data repositories. The proposed solution applies Master Slave scheme to frequent pattern growth technique. Efficient utilization of available computation units is achieved by dynamic reallocation of tasks. Conditional frequent trees are assigned to parallel workers basing on their workload. Proposed enhancements have been successfully implemented using Charm++ library. This paper discusses results of the performance of parallelized FP-growth algorithm against different datasets. The approach has been illustrated with many experiments and measurements performed using multiprocessor and multithreaded computer.
Conference Proceedings of Machine Intelligence for Aerospace Electronic Systems Held in Lisbon, Portugal on 13-16 May 1991 (L’Intelligence Artificielle dans les Systemes Electroniques Aerospatiaux)

DTIC Science & Technology

1991-09-01

more closely matched description is that of finding the distribution with the nature of the application problem. of current flow through a nonuniform ...multiprocessors, may not be effective finding the propagation of wavefronts through a in solving traditional algorithms for these medium with a nonuniform ...properties of the paths are variable cost function is similar to the noteworthy. First, every heading from the start nonuniform resistivity of the plate
Speeding up parallel processing

NASA Technical Reports Server (NTRS)

Denning, Peter J.

1988-01-01

In 1967 Amdahl expressed doubts about the ultimate utility of multiprocessors. The formulation, now called Amdahl's law, became part of the computing folklore and has inspired much skepticism about the ability of the current generation of massively parallel processors to efficiently deliver all their computing power to programs. The widely publicized recent results of a group at Sandia National Laboratory, which showed speedup on a 1024 node hypercube of over 500 for three fixed size problems and over 1000 for three scalable problems, have convincingly challenged this bit of folklore and have given new impetus to parallel scientific computing.
Simulation of electrical and thermal fields in a multimode microwave oven using software written in C++

NASA Astrophysics Data System (ADS)

Abrudean, C.

2017-05-01

Due to multiple reflexions on walls, the electromagnetic field in a multimode microwave oven is difficult to estimate analytically. This paper presents a C++ program that calculates the electromagnetic field in a resonating cavity with an absorbing payload, uses the result to calculate heating in the payload taking its properties into account and then repeats. This results in a simulation of microwave heating, including phenomena like thermal runaway. The program is multithreaded to make use of today’s common multiprocessor/multicore computers.
High speed quantitative digital microscopy

NASA Technical Reports Server (NTRS)

Castleman, K. R.; Price, K. H.; Eskenazi, R.; Ovadya, M. M.; Navon, M. A.

1984-01-01

Modern digital image processing hardware makes possible quantitative analysis of microscope images at high speed. This paper describes an application to automatic screening for cervical cancer. The system uses twelve MC6809 microprocessors arranged in a pipeline multiprocessor configuration. Each processor executes one part of the algorithm on each cell image as it passes through the pipeline. Each processor communicates with its upstream and downstream neighbors via shared two-port memory. Thus no time is devoted to input-output operations as such. This configuration is expected to be at least ten times faster than previous systems.
Can High Bandwidth and Latency Justify Large Cache Blocks in Scalable Multiprocessors?

DTIC Science & Technology

1994-01-01

400 MB/second. 4 Dubnicki’s work used trace-driven simulation, with traces collected on an 8-processor machine. We would expect such small-scale...312 1 6 32 64 of odk Sb* Bad64.M Figure 17: Miss rate of Ind Blocked LU. Figure 18: MCPR of Ind Blocked LU. overall miss rate of TGauss is a factor of...easily. 17 (’his approach assunics that the model paramelers we collect from simulations with infinite band- width (such as the miss rate and the
United States Air Force Summer Research Program -- 1993 Summer Research Program Final Reports. Volume 11. Arnold Engineering Development Center, Frank J. Seiler Research Laboratory, Wilford Hall Medical Center

DTIC Science & Technology

1993-01-01

external parameters such as airflow, temperature, pressure, etc, are measured. Turbine Engine testing generates massive volumes of data at very high...a form that describes the signal flow graph topology as well as specific parameters of the processing blocks in the diagram. On multiprocessor...provides an interface to the symbolic builder and control functions such that parameters may be set during the build operation that will affect the
Preconditioning Strategies for Solving Elliptic Difference Equations on a Multiprocessor.

DTIC Science & Technology

1982-01-01

162, 1977. (MiGr8O] Mitchell, A., Griffiths, D., The Finite Difference Method in Partial Differential Equations , John Wiley & Sons, 1980. [Munk80...ADAL1b T35 AIR FO"CE INST OF TECH WRITG-PATTERSON AFS OH F/6 12/17PR CO ITIONIN STRATEGIES FOR SOLVING ELLIPTIC DIFFERENCE EWA-ETClU) 9UN S C K...TI TLE (ard S.tbr,,I) 5 TYPE OF REP’ORT & F IFIOD C_JVEFO Preconditioning Strategies for Solving Elliptic THESIS/VYYRY#YY0N Difference Equations on
Designing a Virtual-Memory Implementation Using the Motorola MC68010 16- Bit Microprocessor with Multi-Processor Capability Interfaced to the VMEbus

DTIC Science & Technology

1990-06-01

RAM and ROM output enable signals. Figure C.7 shows the logic for the interrupt priority level (IPLO* through IPL2 *) and the interrupt acknowledge...IACK681* signal is sent to the DUART when a level one interrupt acknowledge is output by the CPU. The logic for the IACK681* and the IPLO* through IPL2 ...signals are actually implemented with an EPLD. Listing D.4 in Appendix D presents the Abel description of the IACK681* and IPLO* through IPL2
A multitasking, multisinked, multiprocessor data acquisition front end

DOE Office of Scientific and Technical Information (OSTI.GOV)

Fox, R.; Au, R.; Molen, A.V.

1989-10-01

The authors have developed a generalized data acquisition front end system which is based on MC68020 processors running a commercial real time kernel (rhoSOS), and implemented primarily in a high level language (C). This system has been attached to the back end on-line computing system at NSCL via our high performance ETHERNET protocol. Data may be simultaneously sent to any number of back end systems. Fixed fraction sampling along links to back end computing is also supported. A nonprocedural program generator simplifies the development of experiment specific code.
Parallel Computing for Probabilistic Response Analysis of High Temperature Composites

NASA Technical Reports Server (NTRS)

Sues, R. H.; Lua, Y. J.; Smith, M. D.

1994-01-01

The objective of this Phase I research was to establish the required software and hardware strategies to achieve large scale parallelism in solving PCM problems. To meet this objective, several investigations were conducted. First, we identified the multiple levels of parallelism in PCM and the computational strategies to exploit these parallelisms. Next, several software and hardware efficiency investigations were conducted. These involved the use of three different parallel programming paradigms and solution of two example problems on both a shared-memory multiprocessor and a distributed-memory network of workstations.
Application Portable Parallel Library

NASA Technical Reports Server (NTRS)

Cole, Gary L.; Blech, Richard A.; Quealy, Angela; Townsend, Scott

1995-01-01

Application Portable Parallel Library (APPL) computer program is subroutine-based message-passing software library intended to provide consistent interface to variety of multiprocessor computers on market today. Minimizes effort needed to move application program from one computer to another. User develops application program once and then easily moves application program from parallel computer on which created to another parallel computer. ("Parallel computer" also include heterogeneous collection of networked computers). Written in C language with one FORTRAN 77 subroutine for UNIX-based computers and callable from application programs written in C language or FORTRAN 77.
KALI - An environment for the programming and control of cooperative manipulators

NASA Technical Reports Server (NTRS)

Hayward, Vincent; Hayati, Samad

1988-01-01

A design description is given of a controller for cooperative robots. The background and motivation for multiple arm control are discussed. A set of programming primitives which permit a programmer to specify cooperative tasks are described. Motion primitives specify asynchronous motions, master/slave motions, and cooperative motions. In the context of cooperative robots, trajectory generation issues are discussed and the authors' implementation briefly described. The relations between programming and control in the case of multiple robots are examined. The allocation of various tasks among a multiprocessor computer is described.
Programmable architecture for pixel level processing tasks in lightweight strapdown IR seekers

NASA Astrophysics Data System (ADS)

Coates, James L.

1993-06-01

Typical processing tasks associated with missile IR seeker applications are described, and a straw man suite of algorithms is presented. A fully programmable multiprocessor architecture is realized on a multimedia video processor (MVP) developed by Texas Instruments. The MVP combines the elements of RISC, floating point, advanced DSPs, graphics processors, display and acquisition control, RAM, and external memory. Front end pixel level tasks typical of missile interceptor applications, operating on 256 x 256 sensor imagery, can be processed at frame rates exceeding 100 Hz in a single MVP chip.
From LPF to eLISA: new approach in payload software

NASA Astrophysics Data System (ADS)

Gesa, Ll.; Martin, V.; Conchillo, A.; Ortega, J. A.; Mateos, I.; Torrents, A.; Lopez-Zaragoza, J. P.; Rivas, F.; Lloro, I.; Nofrarias, M.; Sopuerta, CF.

2017-05-01

eLISA will be the first observatory in space to explore the Gravitational Universe. It will gather revolutionary information about the dark universe. This implies a robust and reliable embedded control software and hardware working together. With the lessons learnt with the LISA Pathfinder payload software as baseline, we will introduce in this short article the key concepts and new approaches that our group is working on in terms of software: multiprocessor, self-modifying-code strategies, 100% hardware and software monitoring, embedded scripting, Time and Space Partition among others.
Ring-array processor distribution topology for optical interconnects

NASA Technical Reports Server (NTRS)

Li, Yao; Ha, Berlin; Wang, Ting; Wang, Sunyu; Katz, A.; Lu, X. J.; Kanterakis, E.

1992-01-01

The existing linear and rectangular processor distribution topologies for optical interconnects, although promising in many respects, cannot solve problems such as clock skews, the lack of supporting elements for efficient optical implementation, etc. The use of a ring-array processor distribution topology, however, can overcome these problems. Here, a study of the ring-array topology is conducted with an aim of implementing various fast clock rate, high-performance, compact optical networks for digital electronic multiprocessor computers. Practical design issues are addressed. Some proof-of-principle experimental results are included.
A heterogeneous hierarchical architecture for real-time computing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Skroch, D.A.; Fornaro, R.J.

The need for high-speed data acquisition and control algorithms has prompted continued research in the area of multiprocessor systems and related programming techniques. The result presented here is a unique hardware and software architecture for high-speed real-time computer systems. The implementation of a prototype of this architecture has required the integration of architecture, operating systems and programming languages into a cohesive unit. This report describes a Heterogeneous Hierarchial Architecture for Real-Time (H{sup 2} ART) and system software for program loading and interprocessor communication.
An Expert System Interfaced with a Database System to Perform Troubleshooting of Aircraft Carrier Piping Systems

DTIC Science & Technology

1988-12-01

interval of four feet, and are numbered sequentially bow to stem. * "wing tank" is a tank or void, outboard of the holding bulkhead, away from the center...system and DBMS simultaneously with a multi-processor, allowing queries to the DBMS without terminating the expert system. This method was judged...RECIRC). eductor -strip("Y"):- ask _ques _read_ans(OVBD,"ovbd dis open"),ovbd dis-open(OVBD). eductor-strip("N"):- ask_ques read_ans( LINEUP , "strip lineup
Reader set encoding for directory of shared cache memory in multiprocessor system

DOEpatents

Ahn, Dnaiel; Ceze, Luis H.; Gara, Alan; Ohmacht, Martin; Xiaotong, Zhuang

2014-06-10

In a parallel processing system with speculative execution, conflict checking occurs in a directory lookup of a cache memory that is shared by all processors. In each case, the same physical memory address will map to the same set of that cache, no matter which processor originated that access. The directory includes a dynamic reader set encoding, indicating what speculative threads have read a particular line. This reader set encoding is used in conflict checking. A bitset encoding is used to specify particular threads that have read the line.
Compiling for Application Specific Computational Acceleration in Reconfigurable Architectures Final Report CRADA No. TSB-2033-01

DOE Office of Scientific and Technical Information (OSTI.GOV)

De Supinski, B.; Caliga, D.

2017-09-28

The primary objective of this project was to develop memory optimization technology to efficiently deliver data to, and distribute data within, the SRC-6's Field Programmable Gate Array- ("FPGA") based Multi-Adaptive Processors (MAPs). The hardware/software approach was to explore efficient MAP configurations and generate the compiler technology to exploit those configurations. This memory accessing technology represents an important step towards making reconfigurable symmetric multi-processor (SMP) architectures that will be a costeffective solution for large-scale scientific computing.

Parallel Processing with Digital Signal Processing Hardware and Software

NASA Technical Reports Server (NTRS)

Swenson, Cory V.

1995-01-01

The assembling and testing of a parallel processing system is described which will allow a user to move a Digital Signal Processing (DSP) application from the design stage to the execution/analysis stage through the use of several software tools and hardware devices. The system will be used to demonstrate the feasibility of the Algorithm To Architecture Mapping Model (ATAMM) dataflow paradigm for static multiprocessor solutions of DSP applications. The individual components comprising the system are described followed by the installation procedure, research topics, and initial program development.
Parallel Lattice Basis Reduction Using a Multi-threaded Schnorr-Euchner LLL Algorithm

NASA Astrophysics Data System (ADS)

Backes, Werner; Wetzel, Susanne

In this paper, we introduce a new parallel variant of the LLL lattice basis reduction algorithm. Our new, multi-threaded algorithm is the first to provide an efficient, parallel implementation of the Schorr-Euchner algorithm for today’s multi-processor, multi-core computer architectures. Experiments with sparse and dense lattice bases show a speed-up factor of about 1.8 for the 2-thread and about factor 3.2 for the 4-thread version of our new parallel lattice basis reduction algorithm in comparison to the traditional non-parallel algorithm.
It’s an app. It’s a hypervisor. It’s a hypapp. : Design and Implementation of an eXtensible and Modular Hypervisor Framework

DTIC Science & Technology

2012-06-26

existing commercial-grade virtualization solutions (e.g., Xen, Linux KVM, VMware, or L4 ), but generally do not require such rich functional- ity [9...ports multiprocessor configuration on both AMD and Intel x86 platforms. Xen [8], KVM [27], VMware [39], NOVA [33], Virtual- box‡‡ and L4 are some...popular general purpose (open-source) hypervisors and microkernels which have been used for var- ious hypervisor based research [9,15,26,30,32,34,45
A scalable parallel algorithm for multiple objective linear programs

NASA Technical Reports Server (NTRS)

Wiecek, Malgorzata M.; Zhang, Hong

1994-01-01

This paper presents an ADBASE-based parallel algorithm for solving multiple objective linear programs (MOLP's). Job balance, speedup and scalability are of primary interest in evaluating efficiency of the new algorithm. Implementation results on Intel iPSC/2 and Paragon multiprocessors show that the algorithm significantly speeds up the process of solving MOLP's, which is understood as generating all or some efficient extreme points and unbounded efficient edges. The algorithm gives specially good results for large and very large problems. Motivation and justification for solving such large MOLP's are also included.
Efficient mapping algorithms for scheduling robot inverse dynamics computation on a multiprocessor system

NASA Technical Reports Server (NTRS)

Lee, C. S. G.; Chen, C. L.

1989-01-01

Two efficient mapping algorithms for scheduling the robot inverse dynamics computation consisting of m computational modules with precedence relationship to be executed on a multiprocessor system consisting of p identical homogeneous processors with processor and communication costs to achieve minimum computation time are presented. An objective function is defined in terms of the sum of the processor finishing time and the interprocessor communication time. The minimax optimization is performed on the objective function to obtain the best mapping. This mapping problem can be formulated as a combination of the graph partitioning and the scheduling problems; both have been known to be NP-complete. Thus, to speed up the searching for a solution, two heuristic algorithms were proposed to obtain fast but suboptimal mapping solutions. The first algorithm utilizes the level and the communication intensity of the task modules to construct an ordered priority list of ready modules and the module assignment is performed by a weighted bipartite matching algorithm. For a near-optimal mapping solution, the problem can be solved by the heuristic algorithm with simulated annealing. These proposed optimization algorithms can solve various large-scale problems within a reasonable time. Computer simulations were performed to evaluate and verify the performance and the validity of the proposed mapping algorithms. Finally, experiments for computing the inverse dynamics of a six-jointed PUMA-like manipulator based on the Newton-Euler dynamic equations were implemented on an NCUBE/ten hypercube computer to verify the proposed mapping algorithms. Computer simulation and experimental results are compared and discussed.
PANDA: A distributed multiprocessor operating system

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chubb, P.

1989-01-01

PANDA is a design for a distributed multiprocessor and an operating system. PANDA is designed to allow easy expansion of both hardware and software. As such, the PANDA kernel provides only message passing and memory and process management. The other features needed for the system (device drivers, secondary storage management, etc.) are provided as replaceable user tasks. The thesis presents PANDA's design and implementation, both hardware and software. PANDA uses multiple 68010 processors sharing memory on a VME bus, each such node potentially connected to others via a high speed network. The machine is completely homogeneous: there are no differencesmore » between processors that are detectable by programs running on the machine. A single two-processor node has been constructed. Each processor contains memory management circuits designed to allow processors to share page tables safely. PANDA presents a programmers' model similar to the hardware model: a job is divided into multiple tasks, each having its own address space. Within each task, multiple processes share code and data. Tasks can send messages to each other, and set up virtual circuits between themselves. Peripheral devices such as disc drives are represented within PANDA by tasks. PANDA divides secondary storage into volumes, each volume being accessed by a volume access task, or VAT. All knowledge about the way that data is stored on a disc is kept in its volume's VAT. The design is such that PANDA should provide a useful testbed for file systems and device drivers, as these can be installed without recompiling PANDA itself, and without rebooting the machine.« less
Acceleration of linear stationary iterative processes in multiprocessor computers. II

DOE Office of Scientific and Technical Information (OSTI.GOV)

Romm, Ya.E.

1982-05-01

For pt.I, see Kibernetika, vol.18, no.1, p.47 (1982). For pt.I, see Cybernetics, vol.18, no.1, p.54 (1982). Considers a reduced system of linear algebraic equations x=ax+b, where a=(a/sub ij/) is a real n*n matrix; b is a real vector with common euclidean norm >>>. It is supposed that the existence and uniqueness of solution det (0-a) not equal to e is given, where e is a unit matrix. The linear iterative process converging to x x/sup (k+1)/=fx/sup (k)/, k=0, 1, 2, ..., where the operator f translates r/sup n/ into r/sup n/. In considering implementation of the iterative process (ip) inmore » a multiprocessor system, it is assumed that the number of processors is constant, and are various values of the latter investigated; it is assumed in addition, that the processors perform elementary binary arithmetic operations of addition and multiestimates only include the time of execution of arithmetic operations. With any paralleling of individual iteration, the execution time of the ip is proportional to the number of sequential steps k+1. The author sets the task of reducing the number of sequential steps in the ip so as to execute it in a time proportional to a value smaller than k+1. He also sets the goal of formulating a method of accelerated bit serial-parallel execution of each successive step of the ip, with, in the modification sought, a reduced number of steps in a time comparable to the operation time of logical elements. 6 references.« less
Ultra-fast fluence optimization for beam angle selection algorithms

NASA Astrophysics Data System (ADS)

Bangert, M.; Ziegenhein, P.; Oelfke, U.

2014-03-01

Beam angle selection (BAS) including fluence optimization (FO) is among the most extensive computational tasks in radiotherapy. Precomputed dose influence data (DID) of all considered beam orientations (up to 100 GB for complex cases) has to be handled in the main memory and repeated FOs are required for different beam ensembles. In this paper, the authors describe concepts accelerating FO for BAS algorithms using off-the-shelf multiprocessor workstations. The FO runtime is not dominated by the arithmetic load of the CPUs but by the transportation of DID from the RAM to the CPUs. On multiprocessor workstations, however, the speed of data transportation from the main memory to the CPUs is non-uniform across the RAM; every CPU has a dedicated memory location (node) with minimum access time. We apply a thread node binding strategy to ensure that CPUs only access DID from their preferred node. Ideal load balancing for arbitrary beam ensembles is guaranteed by distributing the DID of every candidate beam equally to all nodes. Furthermore we use a custom sorting scheme of the DID to minimize the overall data transportation. The framework is implemented on an AMD Opteron workstation. One FO iteration comprising dose, objective function, and gradient calculation takes between 0.010 s (9 beams, skull, 0.23 GB DID) and 0.070 s (9 beams, abdomen, 1.50 GB DID). Our overall FO time is < 1 s for small cases, larger cases take ~ 4 s. BAS runs including FOs for 1000 different beam ensembles take ~ 15-70 min, depending on the treatment site. This enables an efficient clinical evaluation of different BAS algorithms.
On the impact of communication complexity in the design of parallel numerical algorithms

NASA Technical Reports Server (NTRS)

Gannon, D.; Vanrosendale, J.

1984-01-01

This paper describes two models of the cost of data movement in parallel numerical algorithms. One model is a generalization of an approach due to Hockney, and is suitable for shared memory multiprocessors where each processor has vector capabilities. The other model is applicable to highly parallel nonshared memory MIMD systems. In the second model, algorithm performance is characterized in terms of the communication network design. Techniques used in VLSI complexity theory are also brought in, and algorithm independent upper bounds on system performance are derived for several problems that are important to scientific computation.
Implementing TCP/IP and a socket interface as a server in a message-passing operating system

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hipp, E.; Wiltzius, D.

1990-03-01

The UNICOS 4.3BSD network code and socket transport interface are the basis of an explicit network server for NLTSS, a message passing operating system on the Cray YMP. A BSD socket user library provides access to the network server using an RPC mechanism. The advantages of this server methodology are its modularity and extensibility to migrate to future protocol suites (e.g. OSI) and transport interfaces. In addition, the network server is implemented in an explicit multi-tasking environment to take advantage of the Cray YMP multi-processor platform. 19 refs., 5 figs.
TLB for Free: In-Cache Address Translation for a Multiprocessor Workstation

DTIC Science & Technology

1985-05-13

LISZT Franz LISP self-compilation I 0.6Mb 145 VAXIMA I Algebraic expert system (a derivative of .MACSY:MA) 1.7Mb 414 CSZOK Two V AXIMA streams...first four were gathered on a VAX running UNIX with an address and instruction tracer [Henr84]. LISZT is the Franz LISP compiler compiling itself...Collisions) (PTE Misses) LISZT 0.584 0.609 0.02.5( 4.3%) (0.009) (0.016) V;\\...’\\lMA 1.855 1.885 0.030(1.6%) (0.004) (0.026) CS100K 2.214 2.260
An approach to solving large reliability models

NASA Technical Reports Server (NTRS)

Boyd, Mark A.; Veeraraghavan, Malathi; Dugan, Joanne Bechta; Trivedi, Kishor S.

1988-01-01

This paper describes a unified approach to the problem of solving large realistic reliability models. The methodology integrates behavioral decomposition, state trunction, and efficient sparse matrix-based numerical methods. The use of fault trees, together with ancillary information regarding dependencies to automatically generate the underlying Markov model state space is proposed. The effectiveness of this approach is illustrated by modeling a state-of-the-art flight control system and a multiprocessor system. Nonexponential distributions for times to failure of components are assumed in the latter example. The modeling tool used for most of this analysis is HARP (the Hybrid Automated Reliability Predictor).
On the parallel solution of parabolic equations

NASA Technical Reports Server (NTRS)

Gallopoulos, E.; Saad, Youcef

1989-01-01

Parallel algorithms for the solution of linear parabolic problems are proposed. The first of these methods is based on using polynomial approximation to the exponential. It does not require solving any linear systems and is highly parallelizable. The two other methods proposed are based on Pade and Chebyshev approximations to the matrix exponential. The parallelization of these methods is achieved by using partial fraction decomposition techniques to solve the resulting systems and thus offers the potential for increased time parallelism in time dependent problems. Experimental results from the Alliant FX/8 and the Cray Y-MP/832 vector multiprocessors are also presented.
Droplet flow along the wall of rectangular channel with gradient of wettability

NASA Astrophysics Data System (ADS)

Kupershtokh, A. L.

2018-03-01

The lattice Boltzmann equations (LBE) method (LBM) is applicable for simulating the multiphysics problems of fluid flows with free boundaries, taking into account the viscosity, surface tension, evaporation and wetting degree of a solid surface. Modeling of the nonstationary motion of a drop of liquid along a solid surface with a variable level of wettability is carried out. For the computer simulation of such a problem, the three-dimensional lattice Boltzmann equations method D3Q19 is used. The LBE method allows us to parallelize the calculations on multiprocessor graphics accelerators using the CUDA programming technology.
Optimum spaceborne computer system design by simulation

NASA Technical Reports Server (NTRS)

Williams, T.; Kerner, H.; Weatherbee, J. E.; Taylor, D. S.; Hodges, B.

1973-01-01

A deterministic simulator is described which models the Automatically Reconfigurable Modular Multiprocessor System (ARMMS), a candidate computer system for future manned and unmanned space missions. Its use as a tool to study and determine the minimum computer system configuration necessary to satisfy the on-board computational requirements of a typical mission is presented. The paper describes how the computer system configuration is determined in order to satisfy the data processing demand of the various shuttle booster subsytems. The configuration which is developed as a result of studies with the simulator is optimal with respect to the efficient use of computer system resources.
Analysis of tasks for dynamic man/machine load balancing in advanced helicopters

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jorgensen, C.C.

1987-10-01

This report considers task allocation requirements imposed by advanced helicopter designs incorporating mixes of human pilots and intelligent machines. Specifically, it develops an analogy between load balancing using distributed non-homogeneous multiprocessors and human team functions. A taxonomy is presented which can be used to identify task combinations likely to cause overload for dynamic scheduling and process allocation mechanisms. Designer criteria are given for function decomposition, separation of control from data, and communication handling for dynamic tasks. Possible effects of n-p complete scheduling problems are noted and a class of combinatorial optimization methods are examined.
Intelligent monitoring and control of semiconductor manufacturing equipment

NASA Technical Reports Server (NTRS)

Murdock, Janet L.; Hayes-Roth, Barbara

1991-01-01

The use of AI methods to monitor and control semiconductor fabrication in a state-of-the-art manufacturing environment called the Rapid Thermal Multiprocessor is described. Semiconductor fabrication involves many complex processing steps with limited opportunities to measure process and product properties. By applying additional process and product knowledge to that limited data, AI methods augment classical control methods by detecting abnormalities and trends, predicting failures, diagnosing, planning corrective action sequences, explaining diagnoses or predictions, and reacting to anomalous conditions that classical control systems typically would not correct. Research methodology and issues are discussed, and two diagnosis scenarios are examined.
Apparatus for multiprocessor-based control of a multiagent robot

NASA Technical Reports Server (NTRS)

Peters, II, Richard Alan (Inventor)

2009-01-01

An architecture for robot intelligence enables a robot to learn new behaviors and create new behavior sequences autonomously and interact with a dynamically changing environment. Sensory information is mapped onto a Sensory Ego-Sphere (SES) that rapidly identifies important changes in the environment and functions much like short term memory. Behaviors are stored in a DBAM that creates an active map from the robot's current state to a goal state and functions much like long term memory. A dream state converts recent activities stored in the SES and creates or modifies behaviors in the DBAM.
Communication requirements of sparse Cholesky factorization with nested dissection ordering

NASA Technical Reports Server (NTRS)

Naik, Vijay K.; Patrick, Merrell L.

1989-01-01

Load distribution schemes for minimizing the communication requirements of the Cholesky factorization of dense and sparse, symmetric, positive definite matrices on multiprocessor systems are presented. The total data traffic in factoring an n x n sparse symmetric positive definite matrix representing an n-vertex regular two-dimensional grid graph using n exp alpha, alpha not greater than 1, processors are shown to be O(n exp 1 + alpha/2). It is O(n), when n exp alpha, alpha not smaller than 1, processors are used. Under the conditions of uniform load distribution, these results are shown to be asymptotically optimal.
Optimization of the computational load of a hypercube supercomputer onboard a mobile robot.

PubMed

Barhen, J; Toomarian, N; Protopopescu, V

1987-12-01

A combinatorial optimization methodology is developed, which enables the efficient use of hypercube multiprocessors onboard mobile intelligent robots dedicated to time-critical missions. The methodology is implemented in terms of large-scale concurrent algorithms based either on fast simulated annealing, or on nonlinear asynchronous neural networks. In particular, analytic expressions are given for the effect of singleneuron perturbations on the systems' configuration energy. Compact neuromorphic data structures are used to model effects such as prec xdence constraints, processor idling times, and task-schedule overlaps. Results for a typical robot-dynamics benchmark are presented.

The high hall ventilation with the simplified simulation of the fan

NASA Astrophysics Data System (ADS)

Kyncl, Martin; Pelant, Jaroslav

2018-06-01

Here we work with the system of equations describing the non-stationary compressible turbulent multi-component flow in the gravitational field. We focus on the numerical simulation of the fan situated inside the high hall. The RANS equations are discretized with the use of the finite volume method. The original modification of the Riemann problem and its solution is used at the boundaries. The combination of specific boundary conditions is used for the simulation of the fan. The presented computational results are computed with own-developed code (C, FORTRAN, multiprocessor, unstructured meshes in general).
A tool for modeling concurrent real-time computation

NASA Technical Reports Server (NTRS)

Sharma, D. D.; Huang, Shie-Rei; Bhatt, Rahul; Sridharan, N. S.

1990-01-01

Real-time computation is a significant area of research in general, and in AI in particular. The complexity of practical real-time problems demands use of knowledge-based problem solving techniques while satisfying real-time performance constraints. Since the demands of a complex real-time problem cannot be predicted (owing to the dynamic nature of the environment) powerful dynamic resource control techniques are needed to monitor and control the performance. A real-time computation model for a real-time tool, an implementation of the QP-Net simulator on a Symbolics machine, and an implementation on a Butterfly multiprocessor machine are briefly described.
ALI: A CSSL/multiprocessor software interface

DOE Office of Scientific and Technical Information (OSTI.GOV)

Makoui, A.; Karplus, W.J.

ALI (A Language Interface) is a software package which translates simulation models expressed in one of the higher-level languages, CSSL-IV or ACSL, into sequences of instructions for each processor of a network of microprocessors. The partitioning of the source program among the processors is automatically accomplished. The code is converted into a data flow graph, analyzed and divided among the processors to minimize the overall execution time in the presence of interprocessor communication delays. This paper describes ALI from the user's point of view and includes a detailed example of the application of ALI to a specific dynamic system simulation.
Multiprocessor switch with selective pairing

DOEpatents

Gara, Alan; Gschwind, Michael K; Salapura, Valentina

2014-03-11

System, method and computer program product for a multiprocessing system to offer selective pairing of processor cores for increased processing reliability. A selective pairing facility is provided that selectively connects, i.e., pairs, multiple microprocessor or processor cores to provide one highly reliable thread (or thread group). Each paired microprocessor or processor cores that provide one highly reliable thread for high-reliability connect with a system components such as a memory "nest" (or memory hierarchy), an optional system controller, and optional interrupt controller, optional I/O or peripheral devices, etc. The memory nest is attached to a selective pairing facility via a switch or a bus
Scheduler for multiprocessor system switch with selective pairing

DOEpatents

Gara, Alan; Gschwind, Michael Karl; Salapura, Valentina

2015-01-06

System, method and computer program product for scheduling threads in a multiprocessing system with selective pairing of processor cores for increased processing reliability. A selective pairing facility is provided that selectively connects, i.e., pairs, multiple microprocessor or processor cores to provide one highly reliable thread (or thread group). The method configures the selective pairing facility to use checking provide one highly reliable thread for high-reliability and allocate threads to corresponding processor cores indicating need for hardware checking. The method configures the selective pairing facility to provide multiple independent cores and allocate threads to corresponding processor cores indicating inherent resilience.
On the impact of communication complexity on the design of parallel numerical algorithms

NASA Technical Reports Server (NTRS)

Gannon, D. B.; Van Rosendale, J.

1984-01-01

This paper describes two models of the cost of data movement in parallel numerical alorithms. One model is a generalization of an approach due to Hockney, and is suitable for shared memory multiprocessors where each processor has vector capabilities. The other model is applicable to highly parallel nonshared memory MIMD systems. In this second model, algorithm performance is characterized in terms of the communication network design. Techniques used in VLSI complexity theory are also brought in, and algorithm-independent upper bounds on system performance are derived for several problems that are important to scientific computation.
Parallel Gaussian elimination of a block tridiagonal matrix using multiple microcomputers

NASA Technical Reports Server (NTRS)

Blech, Richard A.

1989-01-01

The solution of a block tridiagonal matrix using parallel processing is demonstrated. The multiprocessor system on which results were obtained and the software environment used to program that system are described. Theoretical partitioning and resource allocation for the Gaussian elimination method used to solve the matrix are discussed. The results obtained from running 1, 2 and 3 processor versions of the block tridiagonal solver are presented. The PASCAL source code for these solvers is given in the appendix, and may be transportable to other shared memory parallel processors provided that the synchronization outlines are reproduced on the target system.
Applications considerations in the system design of highly concurrent multiprocessors

NASA Technical Reports Server (NTRS)

Lundstrom, Stephen F.

1987-01-01

A flow model processor approach to parallel processing is described, using very-high-performance individual processors, high-speed circuit switched interconnection networks, and a high-speed synchronization capability to minimize the effect of the inherently serial portions of applications on performance. Design studies related to the determination of the number of processors, the memory organization, and the structure of the networks used to interconnect the processor and memory resources are discussed. Simulations indicate that applications centered on the large shared data memory should be able to sustain over 500 million floating point operations per second.
Hierarchical Poly Tree Configurations for the Solution of Dynamically Refined Finte Element Models

NASA Technical Reports Server (NTRS)

Gute, G. D.; Padovan, J.

1993-01-01

This paper demonstrates how a multilevel substructuring technique, called the Hierarchical Poly Tree (HPT), can be used to integrate a localized mesh refinement into the original finite element model more efficiently. The optimal HPT configurations for solving isoparametrically square h-, p-, and hp-extensions on single and multiprocessor computers is derived. In addition, the reduced number of stiffness matrix elements that must be stored when employing this type of solution strategy is quantified. Moreover, the HPT inherently provides localize 'error-trapping' and a logical, efficient means with which to isolate physically anomalous and analytically singular behavior.
Optimization of the computational load of a hypercube supercomputer onboard a mobile robot

NASA Technical Reports Server (NTRS)

Barhen, Jacob; Toomarian, N.; Protopopescu, V.

1987-01-01

A combinatorial optimization methodology is developed, which enables the efficient use of hypercube multiprocessors onboard mobile intelligent robots dedicated to time-critical missions. The methodology is implemented in terms of large-scale concurrent algorithms based either on fast simulated annealing, or on nonlinear asynchronous neural networks. In particular, analytic expressions are given for the effect of single-neuron perturbations on the systems' configuration energy. Compact neuromorphic data structures are used to model effects such as precedence constraints, processor idling times, and task-schedule overlaps. Results for a typical robot-dynamics benchmark are presented.
Parallel solution of sparse one-dimensional dynamic programming problems

NASA Technical Reports Server (NTRS)

Nicol, David M.

1989-01-01

Parallel computation offers the potential for quickly solving large computational problems. However, it is often a non-trivial task to effectively use parallel computers. Solution methods must sometimes be reformulated to exploit parallelism; the reformulations are often more complex than their slower serial counterparts. We illustrate these points by studying the parallelization of sparse one-dimensional dynamic programming problems, those which do not obviously admit substantial parallelization. We propose a new method for parallelizing such problems, develop analytic models which help us to identify problems which parallelize well, and compare the performance of our algorithm with existing algorithms on a multiprocessor.
Improving Quantum Gate Simulation using a GPU

NASA Astrophysics Data System (ADS)

Gutierrez, Eladio; Romero, Sergio; Trenas, Maria A.; Zapata, Emilio L.

2008-11-01

Due to the increasing computing power of the graphics processing units (GPU), they are becoming more and more popular when solving general purpose algorithms. As the simulation of quantum computers results on a problem with exponential complexity, it is advisable to perform a parallel computation, such as the one provided by the SIMD multiprocessors present in recent GPUs. In this paper, we focus on an important quantum algorithm, the quantum Fourier transform (QTF), in order to evaluate different parallelization strategies on a novel GPU architecture. Our implementation makes use of the new CUDA software/hardware architecture developed recently by NVIDIA.
DIRAC universal pilots

NASA Astrophysics Data System (ADS)

Stagni, F.; McNab, A.; Luzzi, C.; Krzemien, W.; Consortium, DIRAC

2017-10-01

In the last few years, new types of computing models, such as IAAS (Infrastructure as a Service) and IAAC (Infrastructure as a Client), gained popularity. New resources may come as part of pledged resources, while others are in the form of opportunistic ones. Most but not all of these new infrastructures are based on virtualization techniques. In addition, some of them, present opportunities for multi-processor computing slots to the users. Virtual Organizations are therefore facing heterogeneity of the available resources and the use of an Interware software like DIRAC to provide the transparent, uniform interface has become essential. The transparent access to the underlying resources is realized by implementing the pilot model. DIRAC’s newest generation of generic pilots (the so-called Pilots 2.0) are the “pilots for all the skies”, and have been successfully released in production more than a year ago. They use a plugin mechanism that makes them easily adaptable. Pilots 2.0 have been used for fetching and running jobs on every type of resource, being it a Worker Node (WN) behind a CREAM/ARC/HTCondor/DIRAC Computing element, a Virtual Machine running on IaaC infrastructures like Vac or BOINC, on IaaS cloud resources managed by Vcycle, the LHCb High Level Trigger farm nodes, and any type of opportunistic computing resource. Make a machine a “Pilot Machine”, and all diversities between them will disappear. This contribution describes how pilots are made suitable for different resources, and the recent steps taken towards a fully unified framework, including monitoring. Also, the cases of multi-processor computing slots either on real or virtual machines, with the whole node or a partition of it, is discussed.
Development of Parallel Code for the Alaska Tsunami Forecast Model

NASA Astrophysics Data System (ADS)

Bahng, B.; Knight, W. R.; Whitmore, P.

2014-12-01

The Alaska Tsunami Forecast Model (ATFM) is a numerical model used to forecast propagation and inundation of tsunamis generated by earthquakes and other means in both the Pacific and Atlantic Oceans. At the U.S. National Tsunami Warning Center (NTWC), the model is mainly used in a pre-computed fashion. That is, results for hundreds of hypothetical events are computed before alerts, and are accessed and calibrated with observations during tsunamis to immediately produce forecasts. ATFM uses the non-linear, depth-averaged, shallow-water equations of motion with multiply nested grids in two-way communications between domains of each parent-child pair as waves get closer to coastal waters. Even with the pre-computation the task becomes non-trivial as sub-grid resolution gets finer. Currently, the finest resolution Digital Elevation Models (DEM) used by ATFM are 1/3 arc-seconds. With a serial code, large or multiple areas of very high resolution can produce run-times that are unrealistic even in a pre-computed approach. One way to increase the model performance is code parallelization used in conjunction with a multi-processor computing environment. NTWC developers have undertaken an ATFM code-parallelization effort to streamline the creation of the pre-computed database of results with the long term aim of tsunami forecasts from source to high resolution shoreline grids in real time. Parallelization will also permit timely regeneration of the forecast model database with new DEMs; and, will make possible future inclusion of new physics such as the non-hydrostatic treatment of tsunami propagation. The purpose of our presentation is to elaborate on the parallelization approach and to show the compute speed increase on various multi-processor systems.
Mapping of H.264 decoding on a multiprocessor architecture

NASA Astrophysics Data System (ADS)

van der Tol, Erik B.; Jaspers, Egbert G.; Gelderblom, Rob H.

2003-05-01

Due to the increasing significance of development costs in the competitive domain of high-volume consumer electronics, generic solutions are required to enable reuse of the design effort and to increase the potential market volume. As a result from this, Systems-on-Chip (SoCs) contain a growing amount of fully programmable media processing devices as opposed to application-specific systems, which offered the most attractive solutions due to a high performance density. The following motivates this trend. First, SoCs are increasingly dominated by their communication infrastructure and embedded memory, thereby making the cost of the functional units less significant. Moreover, the continuously growing design costs require generic solutions that can be applied over a broad product range. Hence, powerful programmable SoCs are becoming increasingly attractive. However, to enable power-efficient designs, that are also scalable over the advancing VLSI technology, parallelism should be fully exploited. Both task-level and instruction-level parallelism can be provided by means of e.g. a VLIW multiprocessor architecture. To provide the above-mentioned scalability, we propose to partition the data over the processors, instead of traditional functional partitioning. An advantage of this approach is the inherent locality of data, which is extremely important for communication-efficient software implementations. Consequently, a software implementation is discussed, enabling e.g. SD resolution H.264 decoding with a two-processor architecture, whereas High-Definition (HD) decoding can be achieved with an eight-processor system, executing the same software. Experimental results show that the data communication considerably reduces up to 65% directly improving the overall performance. Apart from considerable improvement in memory bandwidth, this novel concept of partitioning offers a natural approach for optimally balancing the load of all processors, thereby further improving the overall speedup.
Adaptive weld control for high-integrity welding applications

NASA Technical Reports Server (NTRS)

Powell, Bradley W.

1993-01-01

An advanced adaptive control weld system for high-integrity welding applications is presented. The system consists of a state-of-the-art weld control subsystem, motion control subsystem, and sensor subsystem which closes the loop on the process. The adaptive control subsystem (ACS), which is required to totally close the loop on weld process control, consists of a multiprocessor system, data acquisition hardware, and three welding sensors which provide measurements from all areas around the torch in real time. The ACS acquires all 'measurables' and feeds offset trims back into the weld control and motion control subsystems to modify the 'controllables' in order to maintain a previously defined weld quality.
Runtime Performance Monitoring Tool for RTEMS System Software

NASA Astrophysics Data System (ADS)

Cho, B.; Kim, S.; Park, H.; Kim, H.; Choi, J.; Chae, D.; Lee, J.

2007-08-01

RTEMS is a commercial-grade real-time operating system that supports multi-processor computers. However, there are not many development tools for RTEMS. In this paper, we report new RTEMS-based runtime performance monitoring tool. We have implemented a light weight runtime monitoring task with an extension to the RTEMS APIs. Using our tool, software developers can verify various performance- related parameters during runtime. Our tool can be used during software development phase and in-orbit operation as well. Our implemented target agent is light weight and has small overhead using SpaceWire interface. Efforts to reduce overhead and to add other monitoring parameters are currently under research.
Examining the volume efficiency of the cortical architecture in a multi-processor network model.

PubMed

Ruppin, E; Schwartz, E L; Yeshurun, Y

1993-01-01

The convoluted form of the sheet-like mammalian cortex naturally raises the question whether there is a simple geometrical reason for the prevalence of cortical architecture in the brains of higher vertebrates. Addressing this question, we present a formal analysis of the volume occupied by a massively connected network or processors (neurons) and then consider the pertaining cortical data. Three gross macroscopic features of cortical organization are examined: the segregation of white and gray matter, the circumferential organization of the gray matter around the white matter, and the folded cortical structure. Our results testify to the efficiency of cortical architecture.
A real-time programming system.

PubMed

Townsend, H R

1979-03-01

The paper describes a Basic Operating and Scheduling System (BOSS) designed for a small computer. User programs are organised as self-contained modular 'processes' and the way in which the scheduler divides the time of the computer equally between them, while arranging for any process which has to respond to an interrupt from a peripheral device to be given the necessary priority, is described in detail. Next the procedures provided by the operating system to organise communication between processes are described, and how they are used to construct dynamically self-modifying real-time systems. Finally, the general philosophy of BOSS and applications to a multi-processor assembly are discussed.
Comparing the Performance of Two Dynamic Load Distribution Methods

NASA Technical Reports Server (NTRS)

Kale, L. V.

1987-01-01

Parallel processing of symbolic computations on a message-passing multi-processor presents one challenge: To effectively utilize the available processors, the load must be distributed uniformly to all the processors. However, the structure of these computations cannot be predicted in advance. go, static scheduling methods are not applicable. In this paper, we compare the performance of two dynamic, distributed load balancing methods with extensive simulation studies. The two schemes are: the Contracting Within a Neighborhood (CWN) scheme proposed by us, and the Gradient Model proposed by Lin and Keller. We conclude that although simpler, the CWN is significantly more effective at distributing the work than the Gradient model.

Multiprogramming performance degradation - Case study on a shared memory multiprocessor

NASA Technical Reports Server (NTRS)

Dimpsey, R. T.; Iyer, R. K.

1989-01-01

The performance degradation due to multiprogramming overhead is quantified for a parallel-processing machine. Measurements of real workloads were taken, and it was found that there is a moderate correlation between the completion time of a program and the amount of system overhead measured during program execution. Experiments in controlled environments were then conducted to calculate a lower bound on the performance degradation of parallel jobs caused by multiprogramming overhead. The results show that the multiprogramming overhead of parallel jobs consumes at least 4 percent of the processor time. When two or more serial jobs are introduced into the system, this amount increases to 5.3 percent
A mechanism for efficient debugging of parallel programs

DOE Office of Scientific and Technical Information (OSTI.GOV)

Miller, B.P.; Choi, J.D.

1988-01-01

This paper addresses the design and implementation of an integrated debugging system for parallel programs running on shared memory multi-processors (SMMP). The authors describe the use of flowback analysis to provide information on causal relationships between events in a program's execution without re-executing the program for debugging. The authors introduce a mechanism called incremental tracing that, by using semantic analyses of the debugged program, makes the flowback analysis practical with only a small amount of trace generated during execution. The extend flowback analysis to apply to parallel programs and describe a method to detect race conditions in the interactions ofmore » the co-operating processes.« less
Optimum spaceborne computer system design by simulation

NASA Technical Reports Server (NTRS)

Williams, T.; Weatherbee, J. E.; Taylor, D. S.

1972-01-01

A deterministic digital simulation model is described which models the Automatically Reconfigurable Modular Multiprocessor System (ARMMS), a candidate computer system for future manned and unmanned space missions. Use of the model as a tool in configuring a minimum computer system for a typical mission is demonstrated. The configuration which is developed as a result of studies with the simulator is optimal with respect to the efficient use of computer system resources, i.e., the configuration derived is a minimal one. Other considerations such as increased reliability through the use of standby spares would be taken into account in the definition of a practical system for a given mission.
Design tool for multiprocessor scheduling and evaluation of iterative dataflow algorithms

NASA Technical Reports Server (NTRS)

Jones, Robert L., III

1995-01-01

A graph-theoretic design process and software tool is defined for selecting a multiprocessing scheduling solution for a class of computational problems. The problems of interest are those that can be described with a dataflow graph and are intended to be executed repetitively on a set of identical processors. Typical applications include signal processing and control law problems. Graph-search algorithms and analysis techniques are introduced and shown to effectively determine performance bounds, scheduling constraints, and resource requirements. The software tool applies the design process to a given problem and includes performance optimization through the inclusion of additional precedence constraints among the schedulable tasks.
Implementation and performance of parallel Prolog interpreter

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wei, S.; Kale, L.V.; Balkrishna, R.

1988-01-01

In this paper, the authors discuss the implementation of a parallel Prolog interpreter on different parallel machines. The implementation is based on the REDUCE--OR process model which exploits both AND and OR parallelism in logic programs. It is machine independent as it runs on top of the chare-kernel--a machine-independent parallel programming system. The authors also give the performance of the interpreter running a diverse set of benchmark pargrams on parallel machines including shared memory systems: an Alliant FX/8, Sequent and a MultiMax, and a non-shared memory systems: Intel iPSC/32 hypercube, in addition to its performance on a multiprocessor simulation system.
Instructional image processing on a university mainframe: The Kansas system

NASA Technical Reports Server (NTRS)

Williams, T. H. L.; Siebert, J.; Gunn, C.

1981-01-01

An interactive digital image processing program package was developed that runs on the University of Kansas central computer, a Honeywell Level 66 multi-processor system. The module form of the package allows easy and rapid upgrades and extensions of the system and is used in remote sensing courses in the Department of Geography, in regional five-day short courses for academics and professionals, and also in remote sensing projects and research. The package comprises three self-contained modules of processing functions: Subimage extraction and rectification; image enhancement, preprocessing and data reduction; and classification. Its use in a typical course setting is described. Availability and costs are considered.
Closed-form solutions of performability. [in computer systems

NASA Technical Reports Server (NTRS)

Meyer, J. F.

1982-01-01

It is noted that if computing system performance is degradable then system evaluation must deal simultaneously with aspects of both performance and reliability. One approach is the evaluation of a system's performability which, relative to a specified performance variable Y, generally requires solution of the probability distribution function of Y. The feasibility of closed-form solutions of performability when Y is continuous are examined. In particular, the modeling of a degradable buffer/multiprocessor system is considered whose performance Y is the (normalized) average throughput rate realized during a bounded interval of time. Employing an approximate decomposition of the model, it is shown that a closed-form solution can indeed be obtained.
A CPU benchmark for protein crystallographic refinement.

PubMed

Bourne, P E; Hendrickson, W A

1990-01-01

The CPU time required to complete a cycle of restrained least-squares refinement of a protein structure from X-ray crystallographic data using the FORTRAN codes PROTIN and PROLSQ are reported for 48 different processors, ranging from single-user workstations to supercomputers. Sequential, vector, VLIW, multiprocessor, and RISC hardware architectures are compared using both a small and a large protein structure. Representative compile times for each hardware type are also given, and the improvement in run-time when coding for a specific hardware architecture considered. The benchmarks involve scalar integer and vector floating point arithmetic and are representative of the calculations performed in many scientific disciplines.
A parallel algorithm for switch-level timing simulation on a hypercube multiprocessor

NASA Technical Reports Server (NTRS)

Rao, Hariprasad Nannapaneni

1989-01-01

The parallel approach to speeding up simulation is studied, specifically the simulation of digital LSI MOS circuitry on the Intel iPSC/2 hypercube. The simulation algorithm is based on RSIM, an event driven switch-level simulator that incorporates a linear transistor model for simulating digital MOS circuits. Parallel processing techniques based on the concepts of Virtual Time and rollback are utilized so that portions of the circuit may be simulated on separate processors, in parallel for as large an increase in speed as possible. A partitioning algorithm is also developed in order to subdivide the circuit for parallel processing.
Parallel computation with the force

NASA Technical Reports Server (NTRS)

Jordan, H. F.

1985-01-01

A methodology, called the force, supports the construction of programs to be executed in parallel by a force of processes. The number of processes in the force is unspecified, but potentially very large. The force idea is embodied in a set of macros which produce multiproceossor FORTRAN code and has been studied on two shared memory multiprocessors of fairly different character. The method has simplified the writing of highly parallel programs within a limited class of parallel algorithms and is being extended to cover a broader class. The individual parallel constructs which comprise the force methodology are discussed. Of central concern are their semantics, implementation on different architectures and performance implications.
Software environment for implementing engineering applications on MIMD computers

NASA Technical Reports Server (NTRS)

Lopez, L. A.; Valimohamed, K. A.; Schiff, S.

1990-01-01

In this paper the concept for a software environment for developing engineering application systems for multiprocessor hardware (MIMD) is presented. The philosophy employed is to solve the largest problems possible in a reasonable amount of time, rather than solve existing problems faster. In the proposed environment most of the problems concerning parallel computation and handling of large distributed data spaces are hidden from the application program developer, thereby facilitating the development of large-scale software applications. Applications developed under the environment can be executed on a variety of MIMD hardware; it protects the application software from the effects of a rapidly changing MIMD hardware technology.
A partitioning strategy for nonuniform problems on multiprocessors

NASA Technical Reports Server (NTRS)

Berger, M. J.; Bokhari, S.

1985-01-01

The partitioning of a problem on a domain with unequal work estimates in different subddomains is considered in a way that balances the work load across multiple processors. Such a problem arises for example in solving partial differential equations using an adaptive method that places extra grid points in certain subregions of the domain. A binary decomposition of the domain is used to partition it into rectangles requiring equal computational effort. The communication costs of mapping this partitioning onto different microprocessors: a mesh-connected array, a tree machine and a hypercube is then studied. The communication cost expressions can be used to determine the optimal depth of the above partitioning.
Development of Ada language control software for the NASA power management and distribution test bed

NASA Technical Reports Server (NTRS)

Wright, Ted; Mackin, Michael; Gantose, Dave

1989-01-01

The Ada language software developed to control the NASA Lewis Research Center's Power Management and Distribution testbed is described. The testbed is a reduced-scale prototype of the electric power system to be used on space station Freedom. It is designed to develop and test hardware and software for a 20-kHz power distribution system. The distributed, multiprocessor, testbed control system has an easy-to-use operator interface with an understandable English-text format. A simple interface for algorithm writers that uses the same commands as the operator interface is provided, encouraging interactive exploration of the system.
A comparison of multiprocessor scheduling methods for iterative data flow architectures

NASA Technical Reports Server (NTRS)

Storch, Matthew

1993-01-01

A comparative study is made between the Algorithm to Architecture Mapping Model (ATAMM) and three other related multiprocessing models from the published literature. The primary focus of all four models is the non-preemptive scheduling of large-grain iterative data flow graphs as required in real-time systems, control applications, signal processing, and pipelined computations. Important characteristics of the models such as injection control, dynamic assignment, multiple node instantiations, static optimum unfolding, range-chart guided scheduling, and mathematical optimization are identified. The models from the literature are compared with the ATAMM for performance, scheduling methods, memory requirements, and complexity of scheduling and design procedures.
Memory interface simulator: A computer design aid

NASA Technical Reports Server (NTRS)

Taylor, D. S.; Williams, T.; Weatherbee, J. E.

1972-01-01

Results are presented of a study conducted with a digital simulation model being used in the design of the Automatically Reconfigurable Modular Multiprocessor System (ARMMS), a candidate computer system for future manned and unmanned space missions. The model simulates the activity involved as instructions are fetched from random access memory for execution in one of the system central processing units. A series of model runs measured instruction execution time under various assumptions pertaining to the CPU's and the interface between the CPU's and RAM. Design tradeoffs are presented in the following areas: Bus widths, CPU microprogram read only memory cycle time, multiple instruction fetch, and instruction mix.
Numerical Simulation of the Flow over a Segment-Conical Body on the Basis of Reynolds Equations

NASA Astrophysics Data System (ADS)

Egorov, I. V.; Novikov, A. V.; Palchekovskaya, N. V.

2018-01-01

Numerical simulation was used to study the 3D supersonic flow over a segment-conical body similar in shape to the ExoMars space vehicle. The nonmonotone behavior of the normal force acting on the body placed in a supersonic gas flow was analyzed depending on the angle of attack. The simulation was based on the numerical solution of the unsteady Reynolds-averaged Navier-Stokes equations with a two-parameter differential turbulence model. The solution of the problem was obtained using the in-house solver HSFlow with an efficient parallel algorithm intended for multiprocessor super computers.
Dynamic resource allocation in a hierarchical multiprocessor system: A preliminary study

NASA Technical Reports Server (NTRS)

Ngai, Tin-Fook

1986-01-01

An integrated system approach to dynamic resource allocation is proposed. Some of the problems in dynamic resource allocation and the relationship of these problems to system structures are examined. A general dynamic resource allocation scheme is presented. A hierarchial system architecture which dynamically maps between processor structure and programs at multiple levels of instantiations is described. Simulation experiments were conducted to study dynamic resource allocation on the proposed system. Preliminary evaluation based on simple dynamic resource allocation algorithms indicates that with the proposed system approach, the complexity of dynamic resource management could be significantly reduced while achieving reasonable effective dynamic resource allocation.
Studies of an Optical Multi-Processor Interconnect

DTIC Science & Technology

1994-01-01

that the destinations are uniformly distributed, I is given by (E2k-i 2 k(l -(1 -Pd)i)]ýE) N g -1 i+ ( -- Pd)- +2k- 1 2k - 2’-k’ + k(1 -- (1 -- pd)k...15oo 2000- Number of Nodes Figure 5.16: Variation of Maximum User Throughput with Size I I 94I I I I6 Io•MSe P (snowed) g (dsehd) o𔃺.0 OJU/ I£ 0. 00...curves. Table (6.9) shows that the size and topology of the network does not have any significant effect on the number g of threads needed to keep the
Efficient ICCG on a shared memory multiprocessor

NASA Technical Reports Server (NTRS)

Hammond, Steven W.; Schreiber, Robert

1989-01-01

Different approaches are discussed for exploiting parallelism in the ICCG (Incomplete Cholesky Conjugate Gradient) method for solving large sparse symmetric positive definite systems of equations on a shared memory parallel computer. Techniques for efficiently solving triangular systems and computing sparse matrix-vector products are explored. Three methods for scheduling the tasks in solving triangular systems are implemented on the Sequent Balance 21000. Sample problems that are representative of a large class of problems solved using iterative methods are used. We show that a static analysis to determine data dependences in the triangular solve can greatly improve its parallel efficiency. We also show that ignoring symmetry and storing the whole matrix can reduce solution time substantially.
Load Balancing Unstructured Adaptive Grids for CFD Problems

NASA Technical Reports Server (NTRS)

Biswas, Rupak; Oliker, Leonid

1996-01-01

Mesh adaption is a powerful tool for efficient unstructured-grid computations but causes load imbalance among processors on a parallel machine. A dynamic load balancing method is presented that balances the workload across all processors with a global view. After each parallel tetrahedral mesh adaption, the method first determines if the new mesh is sufficiently unbalanced to warrant a repartitioning. If so, the adapted mesh is repartitioned, with new partitions assigned to processors so that the redistribution cost is minimized. The new partitions are accepted only if the remapping cost is compensated by the improved load balance. Results indicate that this strategy is effective for large-scale scientific computations on distributed-memory multiprocessors.

FPGA-based multiprocessor system for injection molding control.

PubMed

Muñoz-Barron, Benigno; Morales-Velazquez, Luis; Romero-Troncoso, Rene J; Rodriguez-Donate, Carlos; Trejo-Hernandez, Miguel; Benitez-Rangel, Juan P; Osornio-Rios, Roque A

2012-10-18

The plastic industry is a very important manufacturing sector and injection molding is a widely used forming method in that industry. The contribution of this work is the development of a strategy to retrofit control of an injection molding machine based on an embedded system microprocessors sensor network on a field programmable gate array (FPGA) device. Six types of embedded processors are included in the system: a smart-sensor processor, a micro fuzzy logic controller, a programmable logic controller, a system manager, an IO processor and a communication processor. Temperature, pressure and position are controlled by the proposed system and experimentation results show its feasibility and robustness. As validation of the present work, a particular sample was successfully injected.
HEP - A semaphore-synchronized multiprocessor with central control. [Heterogeneous Element Processor

NASA Technical Reports Server (NTRS)

Gilliland, M. C.; Smith, B. J.; Calvert, W.

1976-01-01

The paper describes the design concept of the Heterogeneous Element Processor (HEP), a system tailored to the special needs of scientific simulation. In order to achieve high-speed computation required by simulation, HEP features a hierarchy of processes executing in parallel on a number of processors, with synchronization being largely accomplished by hardware. A full-empty-reserve scheme of synchronization is realized by zero-one-valued hardware semaphores. A typical system has, besides the control computer and the scheduler, an algebraic module, a memory module, a first-in first-out (FIFO) module, an integrator module, and an I/O module. The architecture of the scheduler and the algebraic module is examined in detail.
TRAMP; The next generation data acquisition for RTP

DOE Office of Scientific and Technical Information (OSTI.GOV)

van Haren, P.C.; Wijnoltz, F.

1992-04-01

The Rijnhuizen Tokamak Project RTP is a medium-sized tokamak experiment, which requires a very reliable data-acquisition system, due to its pulsed nature. Analyzing the limitations of an existing CAMAC-based data-acquisition system showed, that substantial increase of performance and flexibility could best be obtained by the construction of an entirely new system. This paper discusses this system, CALLED TRAMP (Transient Recorder and Amoeba Multi Processor), based on tailor-made transient recorders with a multiprocessor computer system in VME running Amoeba. The performance of TRAMP exceeds the performance of the CAMAC system by a factor of four. The plans to increase the flexibilitymore » and for a further increase of performance are presented.« less
UTDallas Offline Computing System for B Physics with the Babar Experiment at SLAC

NASA Astrophysics Data System (ADS)

Benninger, Tracy L.

1998-10-01

The University of Texas at Dallas High Energy Physics group is building a high performance, large storage computing system for B physics research with the BaBar experiment (``factory'') at the Stanford Linear Accelerator Center. The goal of this system is to analyze one terabyte of complex Event Store data from BaBar in one to two days. The foundation of the computing system is a Sun E6000 Enterprise multiprocessor system, with additions of a Sun StorEdge L1800 Tape Library, a Sun Workstation for processing batch jobs, staging disks and interface cards. The design considerations, current status, projects underway, and possible upgrade paths will be discussed.
An effective write policy for software coherence schemes

NASA Technical Reports Server (NTRS)

Chen, Yung-Chin; Veidenbaum, Alexander V.

1992-01-01

The authors study the write behavior and evaluate the performance of various write strategies and buffering techniques for a MIN-based multiprocessor system using the simple software coherence scheme. Hit ratios, memory latencies, total execution time, and total write traffic are used as the performance indices. The write-through write-allocate no-fetch cache using a write-back write buffer is shown to have a better performance than both write-through and write-back caches. This type of write buffer is effective in reducing the volume as well as bursts of write traffic. On average, the use of a write-back cache reduces by 60 percent the total write traffic generated by a write-through cache.
Low Latency Messages on Distributed Memory Multiprocessors

DOE PAGES

Rosing, Matt; Saltz, Joel

1995-01-01

This article describes many of the issues in developing an efficient interface for communication on distributed memory machines. Although the hardware component of message latency is less than 1 ws on many distributed memory machines, the software latency associated with sending and receiving typed messages is on the order of 50 μs. The reason for this imbalance is that the software interface does not match the hardware. By changing the interface to match the hardware more closely, applications with fine grained communication can be put on these machines. This article describes several tests performed and many of the issues involvedmore » in supporting low latency messages on distributed memory machines.« less
Data management system advanced development

NASA Technical Reports Server (NTRS)

Douglas, Katherine; Humphries, Terry

1990-01-01

The Data Management System (DMS) Advanced Development task provides for the development of concepts, new tools, DMS services, and for the testing of the Space Station DMS hardware and software. It also provides for the development of techniques capable of determining the effects of system changes/enhancements, additions of new technology, and/or hardware and software growth on system performance. This paper will address the built-in characteristics which will support network monitoring requirements in the design of the evolving DMS network implementation, functional and performance requirements for a real-time, multiprogramming, multiprocessor operating system, and the possible use of advanced development techniques such as expert systems and artificial intelligence tools in the DMS design.
Performance effects of irregular communications patterns on massively parallel multiprocessors

NASA Technical Reports Server (NTRS)

Saltz, Joel; Petiton, Serge; Berryman, Harry; Rifkin, Adam

1991-01-01

A detailed study of the performance effects of irregular communications patterns on the CM-2 was conducted. The communications capabilities of the CM-2 were characterized under a variety of controlled conditions. In the process of carrying out the performance evaluation, extensive use was made of a parameterized synthetic mesh. In addition, timings with unstructured meshes generated for aerodynamic codes and a set of sparse matrices with banded patterns on non-zeroes were performed. This benchmarking suite stresses the communications capabilities of the CM-2 in a range of different ways. Benchmark results demonstrate that it is possible to make effective use of much of the massive concurrency available in the communications network.
Supercomputer optimizations for stochastic optimal control applications

NASA Technical Reports Server (NTRS)

Chung, Siu-Leung; Hanson, Floyd B.; Xu, Huihuang

1991-01-01

Supercomputer optimizations for a computational method of solving stochastic, multibody, dynamic programming problems are presented. The computational method is valid for a general class of optimal control problems that are nonlinear, multibody dynamical systems, perturbed by general Markov noise in continuous time, i.e., nonsmooth Gaussian as well as jump Poisson random white noise. Optimization techniques for vector multiprocessors or vectorizing supercomputers include advanced data structures, loop restructuring, loop collapsing, blocking, and compiler directives. These advanced computing techniques and superconducting hardware help alleviate Bellman's curse of dimensionality in dynamic programming computations, by permitting the solution of large multibody problems. Possible applications include lumped flight dynamics models for uncertain environments, such as large scale and background random aerospace fluctuations.
Optical backplane interconnect switch for data processors and computers

NASA Technical Reports Server (NTRS)

Hendricks, Herbert D.; Benz, Harry F.; Hammer, Jacob M.

1989-01-01

An optoelectronic integrated device design is reported which can be used to implement an all-optical backplane interconnect switch. The switch is sized to accommodate an array of processors and memories suitable for direct replacement into the basic avionic multiprocessor backplane. The optical backplane interconnect switch is also suitable for direct replacement of the PI bus traffic switch and at the same time, suitable for supporting pipelining of the processor and memory. The 32 bidirectional switchable interconnects are configured with broadcast capability for controls, reconfiguration, and messages. The approach described here can handle a serial interconnection of data processors or a line-to-link interconnection of data processors. An optical fiber demonstration of this approach is presented.
Comparing the OpenMP, MPI, and Hybrid Programming Paradigm on an SMP Cluster

NASA Technical Reports Server (NTRS)

Jost, Gabriele; Jin, Hao-Qiang; anMey, Dieter; Hatay, Ferhat F.

2003-01-01

Clusters of SMP (Symmetric Multi-Processors) nodes provide support for a wide range of parallel programming paradigms. The shared address space within each node is suitable for OpenMP parallelization. Message passing can be employed within and across the nodes of a cluster. Multiple levels of parallelism can be achieved by combining message passing and OpenMP parallelization. Which programming paradigm is the best will depend on the nature of the given problem, the hardware components of the cluster, the network, and the available software. In this study we compare the performance of different implementations of the same CFD benchmark application, using the same numerical algorithm but employing different programming paradigms.
From Wheatstone to Cameron and beyond: overview in 3-D and 4-D imaging technology

NASA Astrophysics Data System (ADS)

Gilbreath, G. Charmaine

2012-02-01

This paper reviews three-dimensional (3-D) and four-dimensional (4-D) imaging technology, from Wheatstone through today, with some prognostications for near future applications. This field is rich in variety, subject specialty, and applications. A major trend, multi-view stereoscopy, is moving the field forward to real-time wide-angle 3-D reconstruction as breakthroughs in parallel processing and multi-processor computers enable very fast processing. Real-time holography meets 4-D imaging reconstruction at the goal of achieving real-time, interactive, 3-D imaging. Applications to telesurgery and telemedicine as well as to the needs of the defense and intelligence communities are also discussed.
On-line data analysis and monitoring for H1 drift chambers

NASA Astrophysics Data System (ADS)

Düllmann, Dirk

1992-05-01

The on-line monitoring, slow control and calibration of the H1 central jet chamber uses a VME multiprocessor system to perform the analysis and a connected Macintosh computer as graphical interface to the operator on shift. Task of this system are: - analysis of event data including on-line track search, - on-line calibration from normal events and testpulse events, - control of the high voltage and monitoring of settings and currents, - monitoring of temperature, pressure and mixture of the chambergas. A program package is described which controls the dataflow between data aquisition, differnt VME CPUs and Macintosh. It allows to run off-line style programs for the different tasks.
Error recovery in shared memory multiprocessors using private caches

NASA Technical Reports Server (NTRS)

Wu, Kun-Lung; Fuchs, W. Kent; Patel, Janak H.

1990-01-01

The problem of recovering from processor transient faults in shared memory multiprocesses systems is examined. A user-transparent checkpointing and recovery scheme using private caches is presented. Processes can recover from errors due to faulty processors by restarting from the checkpointed computation state. Implementation techniques using checkpoint identifiers and recovery stacks are examined as a means of reducing performance degradation in processor utilization during normal execution. This cache-based checkpointing technique prevents rollback propagation, provides rapid recovery, and can be integrated into standard cache coherence protocols. An analytical model is used to estimate the relative performance of the scheme during normal execution. Extensions to take error latency into account are presented.
Ordered fast Fourier transforms on a massively parallel hypercube multiprocessor

NASA Technical Reports Server (NTRS)

Tong, Charles; Swarztrauber, Paul N.

1991-01-01

The present evaluation of alternative, massively parallel hypercube processor-applicable designs for ordered radix-2 decimation-in-frequency FFT algorithms gives attention to the reduction of computation time-dominating communication. A combination of the order and computational phases of the FFT is accordingly employed, in conjunction with sequence-to-processor maps which reduce communication. Two orderings, 'standard' and 'cyclic', in which the order of the transform is the same as that of the input sequence, can be implemented with ease on the Connection Machine (where orderings are determined by geometries and priorities. A parallel method for trigonometric coefficient computation is presented which does not employ trigonometric functions or interprocessor communication.
Software Mechanisms for Multiprocessor TLB Consistency

DTIC Science & Technology

1989-12-01

thanks. Raj V aswani implemented the DASH message-passing system. Ramesh Govindan implemented part of the DASH virtual memory system. G . Scott...1 Latency (ma) 5𔃺 ---·-r··-·r···r··-T·-·-r··-·r····r····-r··-·r··- . g ~~~R=r 18 -----1·--·-r··-T·----1·--··r·---1·----1·--··r·---T·----1 16...model development. Synchronizing TLBs is similar to updating replicated data in a distributed environment. Lee and Garcia-Molina both used an M/ G /1
3-dimensional magnetotelluric inversion including topography using deformed hexahedral edge finite elements and direct solvers parallelized on symmetric multiprocessor computers - Part II: direct data-space inverse solution

NASA Astrophysics Data System (ADS)

Kordy, M.; Wannamaker, P.; Maris, V.; Cherkaev, E.; Hill, G.

2016-01-01

Following the creation described in Part I of a deformable edge finite-element simulator for 3-D magnetotelluric (MT) responses using direct solvers, in Part II we develop an algorithm named HexMT for 3-D regularized inversion of MT data including topography. Direct solvers parallelized on large-RAM, symmetric multiprocessor (SMP) workstations are used also for the Gauss-Newton model update. By exploiting the data-space approach, the computational cost of the model update becomes much less in both time and computer memory than the cost of the forward simulation. In order to regularize using the second norm of the gradient, we factor the matrix related to the regularization term and apply its inverse to the Jacobian, which is done using the MKL PARDISO library. For dense matrix multiplication and factorization related to the model update, we use the PLASMA library which shows very good scalability across processor cores. A synthetic test inversion using a simple hill model shows that including topography can be important; in this case depression of the electric field by the hill can cause false conductors at depth or mask the presence of resistive structure. With a simple model of two buried bricks, a uniform spatial weighting for the norm of model smoothing recovered more accurate locations for the tomographic images compared to weightings which were a function of parameter Jacobians. We implement joint inversion for static distortion matrices tested using the Dublin secret model 2, for which we are able to reduce nRMS to ˜1.1 while avoiding oscillatory convergence. Finally we test the code on field data by inverting full impedance and tipper MT responses collected around Mount St Helens in the Cascade volcanic chain. Among several prominent structures, the north-south trending, eruption-controlling shear zone is clearly imaged in the inversion.
Scalable parallel communications

NASA Technical Reports Server (NTRS)

Maly, K.; Khanna, S.; Overstreet, C. M.; Mukkamala, R.; Zubair, M.; Sekhar, Y. S.; Foudriat, E. C.

1992-01-01

Coarse-grain parallelism in networking (that is, the use of multiple protocol processors running replicated software sending over several physical channels) can be used to provide gigabit communications for a single application. Since parallel network performance is highly dependent on real issues such as hardware properties (e.g., memory speeds and cache hit rates), operating system overhead (e.g., interrupt handling), and protocol performance (e.g., effect of timeouts), we have performed detailed simulations studies of both a bus-based multiprocessor workstation node (based on the Sun Galaxy MP multiprocessor) and a distributed-memory parallel computer node (based on the Touchstone DELTA) to evaluate the behavior of coarse-grain parallelism. Our results indicate: (1) coarse-grain parallelism can deliver multiple 100 Mbps with currently available hardware platforms and existing networking protocols (such as Transmission Control Protocol/Internet Protocol (TCP/IP) and parallel Fiber Distributed Data Interface (FDDI) rings); (2) scale-up is near linear in n, the number of protocol processors, and channels (for small n and up to a few hundred Mbps); and (3) since these results are based on existing hardware without specialized devices (except perhaps for some simple modifications of the FDDI boards), this is a low cost solution to providing multiple 100 Mbps on current machines. In addition, from both the performance analysis and the properties of these architectures, we conclude: (1) multiple processors providing identical services and the use of space division multiplexing for the physical channels can provide better reliability than monolithic approaches (it also provides graceful degradation and low-cost load balancing); (2) coarse-grain parallelism supports running several transport protocols in parallel to provide different types of service (for example, one TCP handles small messages for many users, other TCP's running in parallel provide high bandwidth service to a single application); and (3) coarse grain parallelism will be able to incorporate many future improvements from related work (e.g., reduced data movement, fast TCP, fine-grain parallelism) also with near linear speed-ups.
Improvement of multiprocessing performance by using optical centralized shared bus

NASA Astrophysics Data System (ADS)

Han, Xuliang; Chen, Ray T.

2004-06-01

With the ever-increasing need to solve larger and more complex problems, multiprocessing is attracting more and more research efforts. One of the challenges facing the multiprocessor designers is to fulfill in an effective manner the communications among the processes running in parallel on multiple multiprocessors. The conventional electrical backplane bus provides narrow bandwidth as restricted by the physical limitations of electrical interconnects. In the electrical domain, in order to operate at high frequency, the backplane topology has been changed from the simple shared bus to the complicated switched medium. However, the switched medium is an indirect network. It cannot support multicast/broadcast as effectively as the shared bus. Besides the additional latency of going through the intermediate switching nodes, signal routing introduces substantial delay and considerable system complexity. Alternatively, optics has been well known for its interconnect capability. Therefore, it has become imperative to investigate how to improve multiprocessing performance by utilizing optical interconnects. From the implementation standpoint, the existing optical technologies still cannot fulfill the intelligent functions that a switch fabric should provide as effectively as their electronic counterparts. Thus, an innovative optical technology that can provide sufficient bandwidth capacity, while at the same time, retaining the essential merits of the shared bus topology, is highly desirable for the multiprocessing performance improvement. In this paper, the optical centralized shared bus is proposed for use in the multiprocessing systems. This novel optical interconnect architecture not only utilizes the beneficial characteristics of optics, but also retains the desirable properties of the shared bus topology. Meanwhile, from the architecture standpoint, it fits well in the centralized shared-memory multiprocessing scheme. Therefore, a smooth migration with substantial multiprocessing performance improvement is expected. To prove the technical feasibility from the architecture standpoint, a conceptual emulation of the centralized shared-memory multiprocessing scheme is demonstrated on a generic PCI subsystem with an optical centralized shared bus.
A parallel algorithm for multi-level logic synthesis using the transduction method. M.S. Thesis

NASA Technical Reports Server (NTRS)

Lim, Chieng-Fai

1991-01-01

The Transduction Method has been shown to be a powerful tool in the optimization of multilevel networks. Many tools such as the SYLON synthesis system (X90), (CM89), (LM90) have been developed based on this method. A parallel implementation is presented of SYLON-XTRANS (XM89) on an eight processor Encore Multimax shared memory multiprocessor. It minimizes multilevel networks consisting of simple gates through parallel pruning, gate substitution, gate merging, generalized gate substitution, and gate input reduction. This implementation, called Parallel TRANSduction (PTRANS), also uses partitioning to break large circuits up and performs inter- and intra-partition dynamic load balancing. With this, good speedups and high processor efficiencies are achievable without sacrificing the resulting circuit quality.

Local concurrent error detection and correction in data structures using virtual backpointers

NASA Technical Reports Server (NTRS)

Li, C. C.; Chen, P. P.; Fuchs, W. K.

1987-01-01

A new technique, based on virtual backpointers, for local concurrent error detection and correction in linked data structures is presented. Two new data structures, the Virtual Double Linked List, and the B-tree with Virtual Backpointers, are described. For these structures, double errors can be detected in 0(1) time and errors detected during forward moves can be corrected in 0(1) time. The application of a concurrent auditor process to data structure error detection and correction is analyzed, and an implementation is described, to determine the effect on mean time to failure of a multi-user shared database system. The implementation utilizes a Sequent shared memory multiprocessor system operating on a shared databased of Virtual Double Linked Lists.
Local concurrent error detection and correction in data structures using virtual backpointers

NASA Technical Reports Server (NTRS)

Li, Chung-Chi Jim; Chen, Paul Peichuan; Fuchs, W. Kent

1989-01-01

A new technique, based on virtual backpointers, for local concurrent error detection and correction in linked data strutures is presented. Two new data structures, the Virtual Double Linked List, and the B-tree with Virtual Backpointers, are described. For these structures, double errors can be detected in 0(1) time and errors detected during forward moves can be corrected in 0(1) time. The application of a concurrent auditor process to data structure error detection and correction is analyzed, and an implementation is described, to determine the effect on mean time to failure of a multi-user shared database system. The implementation utilizes a Sequent shared memory multiprocessor system operating on a shared database of Virtual Double Linked Lists.
Performance analysis of parallel branch and bound search with the hypercube architecture

NASA Technical Reports Server (NTRS)

Mraz, Richard T.

1987-01-01

With the availability of commercial parallel computers, researchers are examining new classes of problems which might benefit from parallel computing. This paper presents results of an investigation of the class of search intensive problems. The specific problem discussed is the Least-Cost Branch and Bound search method of deadline job scheduling. The object-oriented design methodology was used to map the problem into a parallel solution. While the initial design was good for a prototype, the best performance resulted from fine-tuning the algorithm for a specific computer. The experiments analyze the computation time, the speed up over a VAX 11/785, and the load balance of the problem when using loosely coupled multiprocessor system based on the hypercube architecture.
Towards developing robust algorithms for solving partial differential equations on MIMD machines

NASA Technical Reports Server (NTRS)

Saltz, Joel H.; Naik, Vijay K.

1988-01-01

Methods for efficient computation of numerical algorithms on a wide variety of MIMD machines are proposed. These techniques reorganize the data dependency patterns to improve the processor utilization. The model problem finds the time-accurate solution to a parabolic partial differential equation discretized in space and implicitly marched forward in time. The algorithms are extensions of Jacobi and SOR. The extensions consist of iterating over a window of several timesteps, allowing efficient overlap of computation with communication. The methods increase the degree to which work can be performed while data are communicated between processors. The effect of the window size and of domain partitioning on the system performance is examined both by implementing the algorithm on a simulated multiprocessor system.
Desktop supercomputer: what can it do?

NASA Astrophysics Data System (ADS)

Bogdanov, A.; Degtyarev, A.; Korkhov, V.

2017-12-01

The paper addresses the issues of solving complex problems that require using supercomputers or multiprocessor clusters available for most researchers nowadays. Efficient distribution of high performance computing resources according to actual application needs has been a major research topic since high-performance computing (HPC) technologies became widely introduced. At the same time, comfortable and transparent access to these resources was a key user requirement. In this paper we discuss approaches to build a virtual private supercomputer available at user's desktop: a virtual computing environment tailored specifically for a target user with a particular target application. We describe and evaluate possibilities to create the virtual supercomputer based on light-weight virtualization technologies, and analyze the efficiency of our approach compared to traditional methods of HPC resource management.
Managing coherence via put/get windows

DOEpatents

Blumrich, Matthias A [Ridgefield, CT; Chen, Dong [Croton on Hudson, NY; Coteus, Paul W [Yorktown Heights, NY; Gara, Alan G [Mount Kisco, NY; Giampapa, Mark E [Irvington, NY; Heidelberger, Philip [Cortlandt Manor, NY; Hoenicke, Dirk [Ossining, NY; Ohmacht, Martin [Yorktown Heights, NY

2011-01-11

A method and apparatus for managing coherence between two processors of a two processor node of a multi-processor computer system. Generally the present invention relates to a software algorithm that simplifies and significantly speeds the management of cache coherence in a message passing parallel computer, and to hardware apparatus that assists this cache coherence algorithm. The software algorithm uses the opening and closing of put/get windows to coordinate the activated required to achieve cache coherence. The hardware apparatus may be an extension to the hardware address decode, that creates, in the physical memory address space of the node, an area of virtual memory that (a) does not actually exist, and (b) is therefore able to respond instantly to read and write requests from the processing elements.
Managing coherence via put/get windows

DOEpatents

Blumrich, Matthias A [Ridgefield, CT; Chen, Dong [Croton on Hudson, NY; Coteus, Paul W [Yorktown Heights, NY; Gara, Alan G [Mount Kisco, NY; Giampapa, Mark E [Irvington, NY; Heidelberger, Philip [Cortlandt Manor, NY; Hoenicke, Dirk [Ossining, NY; Ohmacht, Martin [Yorktown Heights, NY

2012-02-21

A method and apparatus for managing coherence between two processors of a two processor node of a multi-processor computer system. Generally the present invention relates to a software algorithm that simplifies and significantly speeds the management of cache coherence in a message passing parallel computer, and to hardware apparatus that assists this cache coherence algorithm. The software algorithm uses the opening and closing of put/get windows to coordinate the activated required to achieve cache coherence. The hardware apparatus may be an extension to the hardware address decode, that creates, in the physical memory address space of the node, an area of virtual memory that (a) does not actually exist, and (b) is therefore able to respond instantly to read and write requests from the processing elements.
FPGA-Based Multiprocessor System for Injection Molding Control

PubMed Central

Muñoz-Barron, Benigno; Morales-Velazquez, Luis; Romero-Troncoso, Rene J.; Rodriguez-Donate, Carlos; Trejo-Hernandez, Miguel; Benitez-Rangel, Juan P.; Osornio-Rios, Roque A.

2012-01-01

The plastic industry is a very important manufacturing sector and injection molding is a widely used forming method in that industry. The contribution of this work is the development of a strategy to retrofit control of an injection molding machine based on an embedded system microprocessors sensor network on a field programmable gate array (FPGA) device. Six types of embedded processors are included in the system: a smart-sensor processor, a micro fuzzy logic controller, a programmable logic controller, a system manager, an IO processor and a communication processor. Temperature, pressure and position are controlled by the proposed system and experimentation results show its feasibility and robustness. As validation of the present work, a particular sample was successfully injected. PMID:23202036
Flood predictions using the parallel version of distributed numerical physical rainfall-runoff model TOPKAPI

NASA Astrophysics Data System (ADS)

Boyko, Oleksiy; Zheleznyak, Mark

2015-04-01

The original numerical code TOPKAPI-IMMS of the distributed rainfall-runoff model TOPKAPI ( Todini et al, 1996-2014) is developed and implemented in Ukraine. The parallel version of the code has been developed recently to be used on multiprocessors systems - multicore/processors PC and clusters. Algorithm is based on binary-tree decomposition of the watershed for the balancing of the amount of computation for all processors/cores. Message passing interface (MPI) protocol is used as a parallel computing framework. The numerical efficiency of the parallelization algorithms is demonstrated for the case studies for the flood predictions of the mountain watersheds of the Ukrainian Carpathian regions. The modeling results is compared with the predictions based on the lumped parameters models.
Methodologies and systems for heterogeneous concurrent computing

NASA Technical Reports Server (NTRS)

Sunderam, V. S.

1994-01-01

Heterogeneous concurrent computing is gaining increasing acceptance as an alternative or complementary paradigm to multiprocessor-based parallel processing as well as to conventional supercomputing. While algorithmic and programming aspects of heterogeneous concurrent computing are similar to their parallel processing counterparts, system issues, partitioning and scheduling, and performance aspects are significantly different. In this paper, we discuss critical design and implementation issues in heterogeneous concurrent computing, and describe techniques for enhancing its effectiveness. In particular, we highlight the system level infrastructures that are required, aspects of parallel algorithm development that most affect performance, system capabilities and limitations, and tools and methodologies for effective computing in heterogeneous networked environments. We also present recent developments and experiences in the context of the PVM system and comment on ongoing and future work.
Surprise

DOE Office of Scientific and Technical Information (OSTI.GOV)

Curran, L.

1988-03-03

Interest has been building in recent months over the imminent arrival of a new class of supercomputer, called the ''supercomputer on a desk'' or the single-user model. Most observers expected the first such product to come from either of two startups, Ardent Computer Corp. or Stellar Computer Inc. But a surprise entry has shown up. Apollo Computer Inc. is launching a new work station this week that racks up an impressive list of industry first as it puts supercomputer power at the disposal of a single user. The new series 10000 from the Chelmsford, Mass., a company is built aroundmore » a reduced-instruction-set architecture that the company calls Prism, for parallel reduced-instruction-set multiprocessor. This article describes the 10000 and Prism.« less
Towards developing robust algorithms for solving partial differential equations on MIMD machines

NASA Technical Reports Server (NTRS)

Saltz, J. H.; Naik, V. K.

1985-01-01

Methods for efficient computation of numerical algorithms on a wide variety of MIMD machines are proposed. These techniques reorganize the data dependency patterns to improve the processor utilization. The model problem finds the time-accurate solution to a parabolic partial differential equation discretized in space and implicitly marched forward in time. The algorithms are extensions of Jacobi and SOR. The extensions consist of iterating over a window of several timesteps, allowing efficient overlap of computation with communication. The methods increase the degree to which work can be performed while data are communicated between processors. The effect of the window size and of domain partitioning on the system performance is examined both by implementing the algorithm on a simulated multiprocessor system.
Multiple-function multi-input/multi-output digital control and on-line analysis

NASA Technical Reports Server (NTRS)

Hoadley, Sherwood T.; Wieseman, Carol D.; Mcgraw, Sandra M.

1992-01-01

The design and capabilities of two digital controller systems for aeroelastic wind-tunnel models are described. The first allowed control of flutter while performing roll maneuvers with wing load control as well as coordinating the acquisition, storage, and transfer of data for on-line analysis. This system, which employs several digital signal multi-processor (DSP) boards programmed in high-level software languages, is housed in a SUN Workstation environment. A second DCS provides a measure of wind-tunnel safety by functioning as a trip system during testing in the case of high model dynamic response or in case the first DCS fails. The second DCS uses National Instruments LabVIEW Software and Hardware within a Macintosh environment.
Knowledge-based design of generate-and-patch problem solvers that solve global resource assignment problems

NASA Technical Reports Server (NTRS)

Voigt, Kerstin

1992-01-01

We present MENDER, a knowledge based system that implements software design techniques that are specialized to automatically compile generate-and-patch problem solvers that satisfy global resource assignments problems. We provide empirical evidence of the superior performance of generate-and-patch over generate-and-test: even with constrained generation, for a global constraint in the domain of '2D-floorplanning'. For a second constraint in '2D-floorplanning' we show that even when it is possible to incorporate the constraint into a constrained generator, a generate-and-patch problem solver may satisfy the constraint more rapidly. We also briefly summarize how an extended version of our system applies to a constraint in the domain of 'multiprocessor scheduling'.
Compiler analysis for irregular problems in FORTRAN D

NASA Technical Reports Server (NTRS)

Vonhanxleden, Reinhard; Kennedy, Ken; Koelbel, Charles; Das, Raja; Saltz, Joel

1992-01-01

We developed a dataflow framework which provides a basis for rigorously defining strategies to make use of runtime preprocessing methods for distributed memory multiprocessors. In many programs, several loops access the same off-processor memory locations. Our runtime support gives us a mechanism for tracking and reusing copies of off-processor data. A key aspect of our compiler analysis strategy is to determine when it is safe to reuse copies of off-processor data. Another crucial function of the compiler analysis is to identify situations which allow runtime preprocessing overheads to be amortized. This dataflow analysis will make it possible to effectively use the results of interprocedural analysis in our efforts to reduce interprocessor communication and the need for runtime preprocessing.
Progress and supercomputing in computational fluid dynamics; Proceedings of U.S.-Israel Workshop, Jerusalem, Israel, December 1984

NASA Technical Reports Server (NTRS)

Murman, E. M. (Editor); Abarbanel, S. S. (Editor)

1985-01-01

Current developments and future trends in the application of supercomputers to computational fluid dynamics are discussed in reviews and reports. Topics examined include algorithm development for personal-size supercomputers, a multiblock three-dimensional Euler code for out-of-core and multiprocessor calculations, simulation of compressible inviscid and viscous flow, high-resolution solutions of the Euler equations for vortex flows, algorithms for the Navier-Stokes equations, and viscous-flow simulation by FEM and related techniques. Consideration is given to marching iterative methods for the parabolized and thin-layer Navier-Stokes equations, multigrid solutions to quasi-elliptic schemes, secondary instability of free shear flows, simulation of turbulent flow, and problems connected with weather prediction.
Analysis and design of algorithm-based fault-tolerant systems

NASA Technical Reports Server (NTRS)

Nair, V. S. Sukumaran

1990-01-01

An important consideration in the design of high performance multiprocessor systems is to ensure the correctness of the results computed in the presence of transient and intermittent failures. Concurrent error detection and correction have been applied to such systems in order to achieve reliability. Algorithm Based Fault Tolerance (ABFT) was suggested as a cost-effective concurrent error detection scheme. The research was motivated by the complexity involved in the analysis and design of ABFT systems. To that end, a matrix-based model was developed and, based on that, algorithms for both the design and analysis of ABFT systems are formulated. These algorithms are less complex than the existing ones. In order to reduce the complexity further, a hierarchical approach is developed for the analysis of large systems.
Measurement and analysis of workload effects on fault latency in real-time systems

NASA Technical Reports Server (NTRS)

Woodbury, Michael H.; Shin, Kang G.

1990-01-01

The authors demonstrate the need to address fault latency in highly reliable real-time control computer systems. It is noted that the effectiveness of all known recovery mechanisms is greatly reduced in the presence of multiple latent faults. The presence of multiple latent faults increases the possibility of multiple errors, which could result in coverage failure. The authors present experimental evidence indicating that the duration of fault latency is dependent on workload. A synthetic workload generator is used to vary the workload, and a hardware fault injector is applied to inject transient faults of varying durations. This method makes it possible to derive the distribution of fault latency duration. Experimental results obtained from the fault-tolerant multiprocessor at the NASA Airlab are presented and discussed.
Impact of Load Balancing on Unstructured Adaptive Grid Computations for Distributed-Memory Multiprocessors

NASA Technical Reports Server (NTRS)

Sohn, Andrew; Biswas, Rupak; Simon, Horst D.

1996-01-01

The computational requirements for an adaptive solution of unsteady problems change as the simulation progresses. This causes workload imbalance among processors on a parallel machine which, in turn, requires significant data movement at runtime. We present a new dynamic load-balancing framework, called JOVE, that balances the workload across all processors with a global view. Whenever the computational mesh is adapted, JOVE is activated to eliminate the load imbalance. JOVE has been implemented on an IBM SP2 distributed-memory machine in MPI for portability. Experimental results for two model meshes demonstrate that mesh adaption with load balancing gives more than a sixfold improvement over one without load balancing. We also show that JOVE gives a 24-fold speedup on 64 processors compared to sequential execution.
The state of the Java universe

ScienceCinema

Gosling, James

2018-05-22

Speaker Bio: James Gosling received a B.Sc. in computer science from the University of Calgary, Canada in 1977. He received a Ph.D. in computer science from Carnegie-Mellon University in 1983. The title of his thesis was The Algebraic Manipulation of Constraints. He has built satellite data acquisition systems, a multiprocessor version of UNIXÂ®, several compilers, mail systems, and window managers. He has also built a WYSIWYG text editor, a constraint-based drawing editor, and a text editor called Emacs, for UNIX systems. At Sun his early activity was as lead engineer of the NeWS window system. He did the original design of the Java programming language and implemented its original compiler and virtual machine. He has recently been a contributor to the Real-Time Specification for Java.

Design and initial application of the extended aircraft interrogation and display system: Multiprocessing ground support equipment for digital flight systems

NASA Technical Reports Server (NTRS)

Glover, Richard D.

1987-01-01

A pipelined, multiprocessor, general-purpose ground support equipment for digital flight systems has been developed and placed in service at the NASA Ames Research Center's Dryden Flight Research Facility. The design is an outgrowth of the earlier aircraft interrogation and display system (AIDS) used in support of several research projects to provide engineering-units display of internal control system parameters during development and qualification testing activities. The new system, incorporating multiple 16-bit processors, is called extended AIDS (XAIDS) and is now supporting the X-29A forward-swept-wing aircraft project. This report describes the design and mechanization of XAIDS and shows the steps whereby a typical user may take advantage of its high throughput and flexible features.
Problems in characterizing barrier performance

NASA Technical Reports Server (NTRS)

Jordan, Harry F.

1988-01-01

The barrier is a synchronization construct which is useful in separating a parallel program into parallel sections which are executed in sequence. The completion of a barrier requires cooperation among all executing processes. This requirement not only introduces the wait for the slowest process delay which is inherent in the definition of the synchronization, but also has implications for the efficient implementation and measurement of barrier performance in different systems. Types of barrier implementation and their relationship to different multiprocessor environments are described. Then the problem of measuring the performance of barrier implementations on specific machine architecture is discussed. The fact that the barrier synchronization requires the cooperation of all processes makes the problem of performance measurement similarly global. Making non-intrusive measurements of sufficient accuracy can be tricky on systems offering only rudimentary measurement tools.
Fault tolerant data management system

NASA Technical Reports Server (NTRS)

Gustin, W. M.; Smither, M. A.

1972-01-01

Described in detail are: (1) results obtained in modifying the onboard data management system software to a multiprocessor fault tolerant system; (2) a functional description of the prototype buffer I/O units; (3) description of modification to the ACADC and stimuli generating unit of the DTS; and (4) summaries and conclusions on techniques implemented in the rack and prototype buffers. Also documented is the work done in investigating techniques of high speed (5 Mbps) digital data transmission in the data bus environment. The application considered is a multiport data bus operating with the following constraints: no preferred stations; random bus access by all stations; all stations equally likely to source or sink data; no limit to the number of stations along the bus; no branching of the bus; and no restriction on station placement along the bus.
A method for solution of the Euler-Bernoulli beam equation in flexible-link robotic systems

NASA Technical Reports Server (NTRS)

Tzes, Anthony P.; Yurkovich, Stephen; Langer, F. Dieter

1989-01-01

An efficient numerical method for solving the partial differential equation (PDE) governing the flexible manipulator control dynamics is presented. A finite-dimensional model of the equation is obtained through discretization in both time and space coordinates by using finite-difference approximations to the PDE. An expert program written in the Macsyma symbolic language is utilized in order to embed the boundary conditions into the program, accounting for a mass carried at the tip of the manipulator. The advantages of the proposed algorithm are many, including the ability to (1) include any distributed actuation term in the partial differential equation, (2) provide distributed sensing of the beam displacement, (3) easily modify the boundary conditions through an expert program, and (4) modify the structure for running under a multiprocessor environment.
Optimal mapping of irregular finite element domains to parallel processors

NASA Technical Reports Server (NTRS)

Flower, J.; Otto, S.; Salama, M.

1987-01-01

Mapping the solution domain of n-finite elements into N-subdomains that may be processed in parallel by N-processors is an optimal one if the subdomain decomposition results in a well-balanced workload distribution among the processors. The problem is discussed in the context of irregular finite element domains as an important aspect of the efficient utilization of the capabilities of emerging multiprocessor computers. Finding the optimal mapping is an intractable combinatorial optimization problem, for which a satisfactory approximate solution is obtained here by analogy to a method used in statistical mechanics for simulating the annealing process in solids. The simulated annealing analogy and algorithm are described, and numerical results are given for mapping an irregular two-dimensional finite element domain containing a singularity onto the Hypercube computer.
Development of a model of machine hand eye coordination and program specifications for a topological machine vision system

NASA Technical Reports Server (NTRS)

1972-01-01

A unified approach to computer vision and manipulation is developed which is called choreographic vision. In the model, objects to be viewed by a projected robot in the Viking missions to Mars are seen as objects to be manipulated within choreographic contexts controlled by a multimoded remote, supervisory control system on Earth. A new theory of context relations is introduced as a basis for choreographic programming languages. A topological vision model is developed for recognizing objects by shape and contour. This model is integrated with a projected vision system consisting of a multiaperture image dissector TV camera and a ranging laser system. System program specifications integrate eye-hand coordination and topological vision functions and an aerospace multiprocessor implementation is described.
Software/hardware distributed processing network supporting the Ada environment

NASA Astrophysics Data System (ADS)

Wood, Richard J.; Pryk, Zen

1993-09-01

A high-performance, fault-tolerant, distributed network has been developed, tested, and demonstrated. The network is based on the MIPS Computer Systems, Inc. R3000 Risc for processing, VHSIC ASICs for high speed, reliable, inter-node communications and compatible commercial memory and I/O boards. The network is an evolution of the Advanced Onboard Signal Processor (AOSP) architecture. It supports Ada application software with an Ada- implemented operating system. A six-node implementation (capable of expansion up to 256 nodes) of the RISC multiprocessor architecture provides 120 MIPS of scalar throughput, 96 Mbytes of RAM and 24 Mbytes of non-volatile memory. The network provides for all ground processing applications, has merit for space-qualified RISC-based network, and interfaces to advanced Computer Aided Software Engineering (CASE) tools for application software development.
Current Saturation Avoidance with Real-Time Control using DPCS

NASA Astrophysics Data System (ADS)

Ferrara, M.; Hutchinson, I.; Wolfe, S.; Stillerman, J.; Fredian, T.

2008-11-01

Tokamak ohmic-transformer and equilibrium-field coils need to be able to operate near their maximum current capabilities. However if they reach their upper limit during high-performance discharges or in the presence of a strong off-normal event, shape control is compromised, and instability, even plasma disruptions can result. On Alcator C-Mod we designed and tested an anti-saturation routine which detects the impending saturation of OH and EF currents and interpolates to a neighboring safe equilibrium in real-time. The routine was implemented with a multi-processor, multi-time-scale control scheme, which is based on a master process and multiple asynchronous slave processes. The scheme is general and can be used for any computationally-intensive algorithm. USDoE award DE- FC02-99ER545512.
Dataflow Design Tool: User's Manual

NASA Technical Reports Server (NTRS)

Jones, Robert L., III

1996-01-01

The Dataflow Design Tool is a software tool for selecting a multiprocessor scheduling solution for a class of computational problems. The problems of interest are those that can be described with a dataflow graph and are intended to be executed repetitively on a set of identical processors. Typical applications include signal processing and control law problems. The software tool implements graph-search algorithms and analysis techniques based on the dataflow paradigm. Dataflow analyses provided by the software are introduced and shown to effectively determine performance bounds, scheduling constraints, and resource requirements. The software tool provides performance optimization through the inclusion of artificial precedence constraints among the schedulable tasks. The user interface and tool capabilities are described. Examples are provided to demonstrate the analysis, scheduling, and optimization functions facilitated by the tool.
A discrete decentralized variable structure robotic controller

NASA Technical Reports Server (NTRS)

Tumeh, Zuheir S.

1989-01-01

A decentralized trajectory controller for robotic manipulators is designed and tested using a multiprocessor architecture and a PUMA 560 robot arm. The controller is made up of a nominal model-based component and a correction component based on a variable structure suction control approach. The second control component is designed using bounds on the difference between the used and actual values of the model parameters. Since the continuous manipulator system is digitally controlled along a trajectory, a discretized equivalent model of the manipulator is used to derive the controller. The motivation for decentralized control is that the derived algorithms can be executed in parallel using a distributed, relatively inexpensive, architecture where each joint is assigned a microprocessor. Nonlinear interaction and coupling between joints is treated as a disturbance torque that is estimated and compensated for.
Performability modeling based on real data: A case study

NASA Technical Reports Server (NTRS)

Hsueh, M. C.; Iyer, R. K.; Trivedi, K. S.

1988-01-01

Described is a measurement-based performability model based on error and resource usage data collected on a multiprocessor system. A method for identifying the model structure is introduced and the resulting model is validated against real data. Model development from the collection of raw data to the estimation of the expected reward is described. Both normal and error behavior of the system are characterized. The measured data show that the holding times in key operational and error states are not simple exponentials and that a semi-Markov process is necessary to model system behavior. A reward function, based on the service rate and the error rate in each state, is then defined in order to estimate the performability of the system and to depict the cost of apparent types of errors.
[A novel biologic electricity signal measurement based on neuron chip].

PubMed

Lei, Yinsheng; Wang, Mingshi; Sun, Tongjing; Zhu, Qiang; Qin, Ran

2006-06-01

Neuron chip is a multiprocessor with three pipeline CPU; its communication protocol and control processor are integrated in effect to carry out the function of communication, control, attemper, I/O, etc. A novel biologic electronic signal measurement network system is composed of intelligent measurement nodes with neuron chip at the core. In this study, the electronic signals such as ECG, EEG, EMG and BOS can be synthetically measured by those intelligent nodes, and some valuable diagnostic messages are found. Wavelet transform is employed in this system to analyze various biologic electronic signals due to its strong time-frequency ability of decomposing signal local character. Better effect is gained. This paper introduces the hardware structure of network and intelligent measurement node, the measurement theory and the signal figure of data acquisition and processing.
Performability modeling based on real data: A casestudy

NASA Technical Reports Server (NTRS)

Hsueh, M. C.; Iyer, R. K.; Trivedi, K. S.

1987-01-01

Described is a measurement-based performability model based on error and resource usage data collected on a multiprocessor system. A method for identifying the model structure is introduced and the resulting model is validated against real data. Model development from the collection of raw data to the estimation of the expected reward is described. Both normal and error behavior of the system are characterized. The measured data show that the holding times in key operational and error states are not simple exponentials and that a semi-Markov process is necessary to model the system behavior. A reward function, based on the service rate and the error rate in each state, is then defined in order to estimate the performability of the system and to depict the cost of different types of errors.
A new taxonomy for distributed computer systems based upon operating system structure

NASA Technical Reports Server (NTRS)

Foudriat, E. C.

1985-01-01

Characteristics of the resource structure found in the operating system are considered as a mechanism for classifying distributed computer systems. Since the operating system resources, themselves, are too diversified to provide a consistent classification, the structure upon which resources are built and shared are examined. The location and control character of this indivisibility provides the taxonomy for separating uniprocessors, computer networks, network computers (fully distributed processing systems or decentralized computers) and algorithm and/or data control multiprocessors. The taxonomy is important because it divides machines into a classification that is relevant or important to the client and not the hardware architect. It also defines the character of the kernel O/S structure needed for future computer systems. What constitutes an operating system for a fully distributed processor is discussed in detail.
Dynamic configuration management of a multi-standard and multi-mode reconfigurable multi-ASIP architecture for turbo decoding

NASA Astrophysics Data System (ADS)

Lapotre, Vianney; Gogniat, Guy; Baghdadi, Amer; Diguet, Jean-Philippe

2017-12-01

The multiplication of connected devices goes along with a large variety of applications and traffic types needing diverse requirements. Accompanying this connectivity evolution, the last years have seen considerable evolutions of wireless communication standards in the domain of mobile telephone networks, local/wide wireless area networks, and Digital Video Broadcasting (DVB). In this context, intensive research has been conducted to provide flexible turbo decoder targeting high throughput, multi-mode, multi-standard, and power consumption efficiency. However, flexible turbo decoder implementations have not often considered dynamic reconfiguration issues in this context that requires high speed configuration switching. Starting from this assessment, this paper proposes the first solution that allows frame-by-frame run-time configuration management of a multi-processor turbo decoder without compromising the decoding performances.
Problems related to the integration of fault tolerant aircraft electronic systems

NASA Technical Reports Server (NTRS)

Bannister, J. A.; Adlakha, V.; Triyedi, K.; Alspaugh, T. A., Jr.

1982-01-01

Problems related to the design of the hardware for an integrated aircraft electronic system are considered. Taxonomies of concurrent systems are reviewed and a new taxonomy is proposed. An informal methodology intended to identify feasible regions of the taxonomic design space is described. Specific tools are recommended for use in the methodology. Based on the methodology, a preliminary strawman integrated fault tolerant aircraft electronic system is proposed. Next, problems related to the programming and control of inegrated aircraft electronic systems are discussed. Issues of system resource management, including the scheduling and allocation of real time periodic tasks in a multiprocessor environment, are treated in detail. The role of software design in integrated fault tolerant aircraft electronic systems is discussed. Conclusions and recommendations for further work are included.
Multiprocessor speed-up, Amdahl's Law, and the Activity Set Model of parallel program behavior

NASA Technical Reports Server (NTRS)

Gelenbe, Erol

1988-01-01

An important issue in the effective use of parallel processing is the estimation of the speed-up one may expect as a function of the number of processors used. Amdahl's Law has traditionally provided a guideline to this issue, although it appears excessively pessimistic in the light of recent experimental results. In this note, Amdahl's Law is amended by giving a greater importance to the capacity of a program to make effective use of parallel processing, but also recognizing the fact that imbalance of the workload of each processor is bound to occur. An activity set model of parallel program behavior is then introduced along with the corresponding parallelism index of a program, leading to upper and lower bounds to the speed-up.
Comparing barrier algorithms

NASA Technical Reports Server (NTRS)

Arenstorf, Norbert S.; Jordan, Harry F.

1987-01-01

A barrier is a method for synchronizing a large number of concurrent computer processes. After considering some basic synchronization mechanisms, a collection of barrier algorithms with either linear or logarithmic depth are presented. A graphical model is described that profiles the execution of the barriers and other parallel programming constructs. This model shows how the interaction between the barrier algorithms and the work that they synchronize can impact their performance. One result is that logarithmic tree structured barriers show good performance when synchronizing fixed length work, while linear self-scheduled barriers show better performance when synchronizing fixed length work with an imbedded critical section. The linear barriers are better able to exploit the process skew associated with critical sections. Timing experiments, performed on an eighteen processor Flex/32 shared memory multiprocessor, that support these conclusions are detailed.
Asynchronous Data Retrieval from an Object-Oriented Database

NASA Astrophysics Data System (ADS)

Gilbert, Jonathan P.; Bic, Lubomir

We present an object-oriented semantic database model which, similar to other object-oriented systems, combines the virtues of four concepts: the functional data model, a property inheritance hierarchy, abstract data types and message-driven computation. The main emphasis is on the last of these four concepts. We describe generic procedures that permit queries to be processed in a purely message-driven manner. A database is represented as a network of nodes and directed arcs, in which each node is a logical processing element, capable of communicating with other nodes by exchanging messages. This eliminates the need for shared memory and for centralized control during query processing. Hence, the model is suitable for implementation on a multiprocessor computer architecture, consisting of large numbers of loosely coupled processing elements.
Simplifying and speeding the management of intra-node cache coherence

DOEpatents

Blumrich, Matthias A [Ridgefield, CT; Chen, Dong [Croton on Hudson, NY; Coteus, Paul W [Yorktown Heights, NY; Gara, Alan G [Mount Kisco, NY; Giampapa, Mark E [Irvington, NY; Heidelberger, Phillip [Cortlandt Manor, NY; Hoenicke, Dirk [Ossining, NY; Ohmacht, Martin [Yorktown Heights, NY

2012-04-17

A method and apparatus for managing coherence between two processors of a two processor node of a multi-processor computer system. Generally the present invention relates to a software algorithm that simplifies and significantly speeds the management of cache coherence in a message passing parallel computer, and to hardware apparatus that assists this cache coherence algorithm. The software algorithm uses the opening and closing of put/get windows to coordinate the activated required to achieve cache coherence. The hardware apparatus may be an extension to the hardware address decode, that creates, in the physical memory address space of the node, an area of virtual memory that (a) does not actually exist, and (b) is therefore able to respond instantly to read and write requests from the processing elements.

Real-time dynamics simulation of the Cassini spacecraft using DARTS. Part 1: Functional capabilities and the spatial algebra algorithm

NASA Technical Reports Server (NTRS)

Jain, A.; Man, G. K.

1993-01-01

This paper describes the Dynamics Algorithms for Real-Time Simulation (DARTS) real-time hardware-in-the-loop dynamics simulator for the National Aeronautics and Space Administration's Cassini spacecraft. The spacecraft model consists of a central flexible body with a number of articulated rigid-body appendages. The demanding performance requirements from the spacecraft control system require the use of a high fidelity simulator for control system design and testing. The DARTS algorithm provides a new algorithmic and hardware approach to the solution of this hardware-in-the-loop simulation problem. It is based upon the efficient spatial algebra dynamics for flexible multibody systems. A parallel and vectorized version of this algorithm is implemented on a low-cost, multiprocessor computer to meet the simulation timing requirements.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Gosling, James

Speaker Bio: James Gosling received a B.Sc. in computer science from the University of Calgary, Canada in 1977. He received a Ph.D. in computer science from Carnegie-Mellon University in 1983. The title of his thesis was The Algebraic Manipulation of Constraints. He has built satellite data acquisition systems, a multiprocessor version of UNIX®, several compilers, mail systems, and window managers. He has also built a WYSIWYG text editor, a constraint-based drawing editor, and a text editor called Emacs, for UNIX systems. At Sun his early activity was as lead engineer of the NeWS window system. He did the original designmore » of the Java programming language and implemented its original compiler and virtual machine. He has recently been a contributor to the Real-Time Specification for Java.« less
Exploiting parallel computing with limited program changes using a network of microcomputers

NASA Technical Reports Server (NTRS)

Rogers, J. L., Jr.; Sobieszczanski-Sobieski, J.

1985-01-01

Network computing and multiprocessor computers are two discernible trends in parallel processing. The computational behavior of an iterative distributed process in which some subtasks are completed later than others because of an imbalance in computational requirements is of significant interest. The effects of asynchronus processing was studied. A small existing program was converted to perform finite element analysis by distributing substructure analysis over a network of four Apple IIe microcomputers connected to a shared disk, simulating a parallel computer. The substructure analysis uses an iterative, fully stressed, structural resizing procedure. A framework of beams divided into three substructures is used as the finite element model. The effects of asynchronous processing on the convergence of the design variables are determined by not resizing particular substructures on various iterations.
A Scheduling Algorithm for Replicated Real-Time Tasks

NASA Technical Reports Server (NTRS)

Yu, Albert C.; Lin, Kwei-Jay

1991-01-01

We present an algorithm for scheduling real-time periodic tasks on a multiprocessor system under fault-tolerant requirement. Our approach incorporates both the redundancy and masking technique and the imprecise computation model. Since the tasks in hard real-time systems have stringent timing constraints, the redundancy and masking technique are more appropriate than the rollback techniques which usually require extra time for error recovery. The imprecise computation model provides flexible functionality by trading off the quality of the result produced by a task with the amount of processing time required to produce it. It therefore permits the performance of a real-time system to degrade gracefully. We evaluate the algorithm by stochastic analysis and Monte Carlo simulations. The results show that the algorithm is resilient under hardware failures.
Managing coherence via put/get windows

DOE Office of Scientific and Technical Information (OSTI.GOV)

Blumrich, Matthias A; Chen, Dong; Coteus, Paul W

A method and apparatus for managing coherence between two processors of a two processor node of a multi-processor computer system. Generally the present invention relates to a software algorithm that simplifies and significantly speeds the management of cache coherence in a message passing parallel computer, and to hardware apparatus that assists this cache coherence algorithm. The software algorithm uses the opening and closing of put/get windows to coordinate the activated required to achieve cache coherence. The hardware apparatus may be an extension to the hardware address decode, that creates, in the physical memory address space of the node, an areamore » of virtual memory that (a) does not actually exist, and (b) is therefore able to respond instantly to read and write requests from the processing elements.« less
Portable parallel stochastic optimization for the design of aeropropulsion components

NASA Technical Reports Server (NTRS)

Sues, Robert H.; Rhodes, G. S.

1994-01-01

This report presents the results of Phase 1 research to develop a methodology for performing large-scale Multi-disciplinary Stochastic Optimization (MSO) for the design of aerospace systems ranging from aeropropulsion components to complete aircraft configurations. The current research recognizes that such design optimization problems are computationally expensive, and require the use of either massively parallel or multiple-processor computers. The methodology also recognizes that many operational and performance parameters are uncertain, and that uncertainty must be considered explicitly to achieve optimum performance and cost. The objective of this Phase 1 research was to initialize the development of an MSO methodology that is portable to a wide variety of hardware platforms, while achieving efficient, large-scale parallelism when multiple processors are available. The first effort in the project was a literature review of available computer hardware, as well as review of portable, parallel programming environments. The first effort was to implement the MSO methodology for a problem using the portable parallel programming language, Parallel Virtual Machine (PVM). The third and final effort was to demonstrate the example on a variety of computers, including a distributed-memory multiprocessor, a distributed-memory network of workstations, and a single-processor workstation. Results indicate the MSO methodology can be well-applied towards large-scale aerospace design problems. Nearly perfect linear speedup was demonstrated for computation of optimization sensitivity coefficients on both a 128-node distributed-memory multiprocessor (the Intel iPSC/860) and a network of workstations (speedups of almost 19 times achieved for 20 workstations). Very high parallel efficiencies (75 percent for 31 processors and 60 percent for 50 processors) were also achieved for computation of aerodynamic influence coefficients on the Intel. Finally, the multi-level parallelization strategy that will be needed for large-scale MSO problems was demonstrated to be highly efficient. The same parallel code instructions were used on both platforms, demonstrating portability. There are many applications for which MSO can be applied, including NASA's High-Speed-Civil Transport, and advanced propulsion systems. The use of MSO will reduce design and development time and testing costs dramatically.
By Hand or Not By-Hand: A Case Study of Alternative Approaches to Parallelize CFD Applications

NASA Technical Reports Server (NTRS)

Yan, Jerry C.; Bailey, David (Technical Monitor)

1997-01-01

While parallel processing promises to speed up applications by several orders of magnitude, the performance achieved still depends upon several factors, including the multiprocessor architecture, system software, data distribution and alignment, as well as the methods used for partitioning the application and mapping its components onto the architecture. The existence of the Gorden Bell Prize given out at Supercomputing every year suggests that while good performance can be attained for real applications on general purpose multiprocessors, the large investment in man-power and time still has to be repeated for each application-machine combination. As applications and machine architectures become more complex, the cost and time-delays for obtaining performance by hand will become prohibitive. Computer users today can turn to three possible avenues for help: parallel libraries, parallel languages and compilers, interactive parallelization tools. The success of these methodologies, in turn, depends on proper application of data dependency analysis, program structure recognition and transformation, performance prediction as well as exploitation of user supplied knowledge. NASA has been developing multidisciplinary applications on highly parallel architectures under the High Performance Computing and Communications Program. Over the past six years, the transition of underlying hardware and system software have forced the scientists to spend a large effort to migrate and recede their applications. Various attempts to exploit software tools to automate the parallelization process have not produced favorable results. In this paper, we report our most recent experience with CAPTOOL, a package developed at Greenwich University. We have chosen CAPTOOL for three reasons: 1. CAPTOOL accepts a FORTRAN 77 program as input. This suggests its potential applicability to a large collection of legacy codes currently in use. 2. CAPTOOL employs domain decomposition to obtain parallelism. Although the fact that not all kinds of parallelism are handled may seem unappealing, many NASA applications in computational aerosciences as well as earth and space sciences are amenable to domain decomposition. 3. CAPTOOL generates code for a large variety of environments employed across NASA centers: MPI/PVM on network of workstations to the IBS/SP2 and CRAY/T3D.
Instrumented Architectural Simulation System

NASA Technical Reports Server (NTRS)

Delagi, B. A.; Saraiya, N.; Nishimura, S.; Byrd, G.

1987-01-01

Simulation of systems at an architectural level can offer an effective way to study critical design choices if (1) the performance of the simulator is adequate to examine designs executing significant code bodies, not just toy problems or small application fragements, (2) the details of the simulation include the critical details of the design, (3) the view of the design presented by the simulator instrumentation leads to useful insights on the problems with the design, and (4) there is enough flexibility in the simulation system so that the asking of unplanned questions is not suppressed by the weight of the mechanics involved in making changes either in the design or its measurement. A simulation system with these goals is described together with the approach to its implementation. Its application to the study of a particular class of multiprocessor hardware system architectures is illustrated.
Design and evaluation of a DAMQ multiprocessor network with self-compacting buffers

DOE Office of Scientific and Technical Information (OSTI.GOV)

Park, J.; O`Krafka, B.W.O.; Vassiliadis, S.

1994-12-31

This paper describes a new approach to implement Dynamically Allocated Multi-Queue (DAMQ) switching elements using a technique called ``self-compacting buffers``. This technique is efficient in that the amount of hardware required to manage the buffers is relatively small; it offers high performance since it is an implementation of a DAMQ. The first part of this paper describes the self-compacting buffer architecture in detail, and compares it against a competing DAMQ switch design. The second part presents extensive simulation results comparing the performance of a self compacting buffer switch against an ideal switch including several examples of k-ary n-cubes and deltamore » networks. In addition, simulation results show how the performance of an entire network can be quickly and accurately approximated by simulating just a single switching element.« less
Comparative Implementation of High Performance Computing for Power System Dynamic Simulations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jin, Shuangshuang; Huang, Zhenyu; Diao, Ruisheng

Dynamic simulation for transient stability assessment is one of the most important, but intensive, computations for power system planning and operation. Present commercial software is mainly designed for sequential computation to run a single simulation, which is very time consuming with a single processer. The application of High Performance Computing (HPC) to dynamic simulations is very promising in accelerating the computing process by parallelizing its kernel algorithms while maintaining the same level of computation accuracy. This paper describes the comparative implementation of four parallel dynamic simulation schemes in two state-of-the-art HPC environments: Message Passing Interface (MPI) and Open Multi-Processing (OpenMP).more » These implementations serve to match the application with dedicated multi-processor computing hardware and maximize the utilization and benefits of HPC during the development process.« less
Statistical methodologies for the control of dynamic remapping

NASA Technical Reports Server (NTRS)

Saltz, J. H.; Nicol, D. M.

1986-01-01

Following an initial mapping of a problem onto a multiprocessor machine or computer network, system performance often deteriorates with time. In order to maintain high performance, it may be necessary to remap the problem. The decision to remap must take into account measurements of performance deterioration, the cost of remapping, and the estimated benefits achieved by remapping. We examine the tradeoff between the costs and the benefits of remapping two qualitatively different kinds of problems. One problem assumes that performance deteriorates gradually, the other assumes that performance deteriorates suddenly. We consider a variety of policies for governing when to remap. In order to evaluate these policies, statistical models of problem behaviors are developed. Simulation results are presented which compare simple policies with computationally expensive optimal decision policies; these results demonstrate that for each problem type, the proposed simple policies are effective and robust.
libdrdc: software standards library

NASA Astrophysics Data System (ADS)

Erickson, David; Peng, Tie

2008-04-01

This paper presents the libdrdc software standards library including internal nomenclature, definitions, units of measure, coordinate reference frames, and representations for use in autonomous systems research. This library is a configurable, portable C-function wrapped C++ / Object Oriented C library developed to be independent of software middleware, system architecture, processor, or operating system. It is designed to use the automatically-tuned linear algebra suite (ATLAS) and Basic Linear Algebra Suite (BLAS) and port to firmware and software. The library goal is to unify data collection and representation for various microcontrollers and Central Processing Unit (CPU) cores and to provide a common Application Binary Interface (ABI) for research projects at all scales. The library supports multi-platform development and currently works on Windows, Unix, GNU/Linux, and Real-Time Executive for Multiprocessor Systems (RTEMS). This library is made available under LGPL version 2.1 license.
Multithreaded Model for Dynamic Load Balancing Parallel Adaptive PDE Computations

NASA Technical Reports Server (NTRS)

Chrisochoides, Nikos

1995-01-01

We present a multithreaded model for the dynamic load-balancing of numerical, adaptive computations required for the solution of Partial Differential Equations (PDE's) on multiprocessors. Multithreading is used as a means of exploring concurrency in the processor level in order to tolerate synchronization costs inherent to traditional (non-threaded) parallel adaptive PDE solvers. Our preliminary analysis for parallel, adaptive PDE solvers indicates that multithreading can be used an a mechanism to mask overheads required for the dynamic balancing of processor workloads with computations required for the actual numerical solution of the PDE's. Also, multithreading can simplify the implementation of dynamic load-balancing algorithms, a task that is very difficult for traditional data parallel adaptive PDE computations. Unfortunately, multithreading does not always simplify program complexity, often makes code re-usability not an easy task, and increases software complexity.
The BLAZE language: A parallel language for scientific programming

NASA Technical Reports Server (NTRS)

Mehrotra, P.; Vanrosendale, J.

1985-01-01

A Pascal-like scientific programming language, Blaze, is described. Blaze contains array arithmetic, forall loops, and APL-style accumulation operators, which allow natural expression of fine grained parallelism. It also employs an applicative or functional procedure invocation mechanism, which makes it easy for compilers to extract coarse grained parallelism using machine specific program restructuring. Thus Blaze should allow one to achieve highly parallel execution on multiprocessor architectures, while still providing the user with onceptually sequential control flow. A central goal in the design of Blaze is portability across a broad range of parallel architectures. The multiple levels of parallelism present in Blaze code, in principle, allow a compiler to extract the types of parallelism appropriate for the given architecture while neglecting the remainder. The features of Blaze are described and shows how this language would be used in typical scientific programming.
VxWorks 6.9 for LEON

NASA Astrophysics Data System (ADS)

Cederman, Daniel; Hellstrom, Daniel

2016-08-01

The VxWorks operating system together with the Cobham Grislier LEON architectural port provides an efficient platform for the development of software for space applications. It supports both uni-and multiprocessor mode (SMP or AMP) and comes with an integrated development environment with several debugging and analysis tools. The LEON architectural port from Cobham Grislier supports LEON2/3/4 systems and includes drivers for all standard on-chip peripherals, as well as support for RASTA boards. In this paper we will highlight some the many features of VxWorks and the LEON architectural port. The latest version of the architectural port now supports VxWorks 6.9 (the previous version was for VxWorks 6.7) and has the support for the GR740, the commercially available quad-core LEON system, designed as the European Space Agency's Next Generation Microprocessor (NGMP).
A pluggable framework for parallel pairwise sequence search.

PubMed

Archuleta, Jeremy; Feng, Wu-chun; Tilevich, Eli

2007-01-01

The current and near future of the computing industry is one of multi-core and multi-processor technology. Most existing sequence-search tools have been designed with a focus on single-core, single-processor systems. This discrepancy between software design and hardware architecture substantially hinders sequence-search performance by not allowing full utilization of the hardware. This paper presents a novel framework that will aid the conversion of serial sequence-search tools into a parallel version that can take full advantage of the available hardware. The framework, which is based on a software architecture called mixin layers with refined roles, enables modules to be plugged into the framework with minimal effort. The inherent modular design improves maintenance and extensibility, thus opening up a plethora of opportunities for advanced algorithmic features to be developed and incorporated while routine maintenance of the codebase persists.
Parallel architecture for rapid image generation and analysis

DOE Office of Scientific and Technical Information (OSTI.GOV)

Nerheim, R.J.

1987-01-01

A multiprocessor architecture inspired by the Disney multiplane camera is proposed. For many applications, this approach produces a natural mapping of processors to objects in a scene. Such a mapping promotes parallelism and reduces the hidden-surface work with minimal interprocessor communication and low-overhead cost. Existing graphics architectures store the final picture as a monolithic entity. The architecture here stores each object's image separately. It assembles the final composite picture from component images only when the video display needs to be refreshed. This organization simplifies the work required to animate moving objects that occlude other objects. In addition, the architecture hasmore » multiple processors that generate the component images in parallel. This further shortens the time needed to create a composite picture. In addition to generating images for animation, the architecture has the ability to decompose images.« less
Hyperswitch communication network

NASA Technical Reports Server (NTRS)

Peterson, J.; Pniel, M.; Upchurch, E.

1991-01-01

The Hyperswitch Communication Network (HCN) is a large scale parallel computer prototype being developed at JPL. Commercial versions of the HCN computer are planned. The HCN computer being designed is a message passing multiple instruction multiple data (MIMD) computer, and offers many advantages in price-performance ratio, reliability and availability, and manufacturing over traditional uniprocessors and bus based multiprocessors. The design of the HCN operating system is a uniquely flexible environment that combines both parallel processing and distributed processing. This programming paradigm can achieve a balance among the following competing factors: performance in processing and communications, user friendliness, and fault tolerance. The prototype is being designed to accommodate a maximum of 64 state of the art microprocessors. The HCN is classified as a distributed supercomputer. The HCN system is described, and the performance/cost analysis and other competing factors within the system design are reviewed.
Object impedance control for cooperative manipulation - Theory and experimental results

NASA Technical Reports Server (NTRS)

Schneider, Stanley A.; Cannon, Robert H., Jr.

1992-01-01

This paper presents the dynamic control module of the Dynamic and Strategic Control of Cooperating Manipulators (DASCCOM) project at Stanford University's Aerospace Robotics Laboratory. First, the cooperative manipulation problem is analyzed from a systems perspective, and the desirable features of a control system for cooperative manipulation are discussed. Next, a control policy is developed that enforces a controlled impedance not of the individual arm endpoints, but of the manipulated object itself. A parallel implementation for a multiprocessor system is presented. The controller fully compensates for the system dynamics and directly controls the object internal forces. Most importantly, it presents a simple, powerful, intuitive interface to higher level strategic control modules. Experimental results from a dual two-link-arm robotic system are used to compare the object impedance controller with other strategies, both for free-motion slews and environmental contact.
Strategies for concurrent processing of complex algorithms in data driven architectures

NASA Technical Reports Server (NTRS)

Stoughton, John W.; Mielke, Roland R.

1988-01-01

Research directed at developing a graph theoretical model for describing data and control flow associated with the execution of large grained algorithms in a special distributed computer environment is presented. This model is identified by the acronym ATAMM which represents Algorithms To Architecture Mapping Model. The purpose of such a model is to provide a basis for establishing rules for relating an algorithm to its execution in a multiprocessor environment. Specifications derived from the model lead directly to the description of a data flow architecture which is a consequence of the inherent behavior of the data and control flow described by the model. The purpose of the ATAMM based architecture is to provide an analytical basis for performance evaluation. The ATAMM model and architecture specifications are demonstrated on a prototype system for concept validation.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.