Approximation algorithms for scheduling unrelated parallel machines with release dates
NASA Astrophysics Data System (ADS)
Avdeenko, T. V.; Mesentsev, Y. A.; Estraykh, I. V.
2017-01-01
In this paper we propose approaches to optimal scheduling of unrelated parallel machines with release dates. One approach is based on the scheme of dynamic programming modified with adaptive narrowing of search domain ensuring its computational effectiveness. We discussed complexity of the exact schedules synthesis and compared it with approximate, close to optimal, solutions. Also we explain how the algorithm works for the example of two unrelated parallel machines and five jobs with release dates. Performance results that show the efficiency of the proposed approach have been given.
Scheduling Jobs with Variable Job Processing Times on Unrelated Parallel Machines
Zhang, Guang-Qian; Wang, Jian-Jun; Liu, Ya-Jing
2014-01-01
m unrelated parallel machines scheduling problems with variable job processing times are considered, where the processing time of a job is a function of its position in a sequence, its starting time, and its resource allocation. The objective is to determine the optimal resource allocation and the optimal schedule to minimize a total cost function that dependents on the total completion (waiting) time, the total machine load, the total absolute differences in completion (waiting) times on all machines, and total resource cost. If the number of machines is a given constant number, we propose a polynomial time algorithm to solve the problem. PMID:24982933
Scheduling for Parallel Supercomputing: A Historical Perspective of Achievable Utilization
NASA Technical Reports Server (NTRS)
Jones, James Patton; Nitzberg, Bill
1999-01-01
The NAS facility has operated parallel supercomputers for the past 11 years, including the Intel iPSC/860, Intel Paragon, Thinking Machines CM-5, IBM SP-2, and Cray Origin 2000. Across this wide variety of machine architectures, across a span of 10 years, across a large number of different users, and through thousands of minor configuration and policy changes, the utilization of these machines shows three general trends: (1) scheduling using a naive FIFO first-fit policy results in 40-60% utilization, (2) switching to the more sophisticated dynamic backfilling scheduling algorithm improves utilization by about 15 percentage points (yielding about 70% utilization), and (3) reducing the maximum allowable job size further increases utilization. Most surprising is the consistency of these trends. Over the lifetime of the NAS parallel systems, we made hundreds, perhaps thousands, of small changes to hardware, software, and policy, yet, utilization was affected little. In particular these results show that the goal of achieving near 100% utilization while supporting a real parallel supercomputing workload is unrealistic.
A parallel-machine scheduling problem with two competing agents
NASA Astrophysics Data System (ADS)
Lee, Wen-Chiung; Chung, Yu-Hsiang; Wang, Jen-Ya
2017-06-01
Scheduling with two competing agents has become popular in recent years. Most of the research has focused on single-machine problems. This article considers a parallel-machine problem, the objective of which is to minimize the total completion time of jobs from the first agent given that the maximum tardiness of jobs from the second agent cannot exceed an upper bound. The NP-hardness of this problem is also examined. A genetic algorithm equipped with local search is proposed to search for the near-optimal solution. Computational experiments are conducted to evaluate the proposed genetic algorithm.
Optimized Hypervisor Scheduler for Parallel Discrete Event Simulations on Virtual Machine Platforms
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yoginath, Srikanth B; Perumalla, Kalyan S
2013-01-01
With the advent of virtual machine (VM)-based platforms for parallel computing, it is now possible to execute parallel discrete event simulations (PDES) over multiple virtual machines, in contrast to executing in native mode directly over hardware as is traditionally done over the past decades. While mature VM-based parallel systems now offer new, compelling benefits such as serviceability, dynamic reconfigurability and overall cost effectiveness, the runtime performance of parallel applications can be significantly affected. In particular, most VM-based platforms are optimized for general workloads, but PDES execution exhibits unique dynamics significantly different from other workloads. Here we first present results frommore » experiments that highlight the gross deterioration of the runtime performance of VM-based PDES simulations when executed using traditional VM schedulers, quantitatively showing the bad scaling properties of the scheduler as the number of VMs is increased. The mismatch is fundamental in nature in the sense that any fairness-based VM scheduler implementation would exhibit this mismatch with PDES runs. We also present a new scheduler optimized specifically for PDES applications, and describe its design and implementation. Experimental results obtained from running PDES benchmarks (PHOLD and vehicular traffic simulations) over VMs show over an order of magnitude improvement in the run time of the PDES-optimized scheduler relative to the regular VM scheduler, with over 20 reduction in run time of simulations using up to 64 VMs. The observations and results are timely in the context of emerging systems such as cloud platforms and VM-based high performance computing installations, highlighting to the community the need for PDES-specific support, and the feasibility of significantly reducing the runtime overhead for scalable PDES on VM platforms.« less
A new scheduling algorithm for parallel sparse LU factorization with static pivoting
DOE Office of Scientific and Technical Information (OSTI.GOV)
Grigori, Laura; Li, Xiaoye S.
2002-08-20
In this paper we present a static scheduling algorithm for parallel sparse LU factorization with static pivoting. The algorithm is divided into mapping and scheduling phases, using the symmetric pruned graphs of L' and U to represent dependencies. The scheduling algorithm is designed for driving the parallel execution of the factorization on a distributed-memory architecture. Experimental results and comparisons with SuperLU{_}DIST are reported after applying this algorithm on real world application matrices on an IBM SP RS/6000 distributed memory machine.
Hidri, Lotfi; Gharbi, Anis; Louly, Mohamed Aly
2014-01-01
We focus on the two-center hybrid flow shop scheduling problem with identical parallel machines and removal times. The job removal time is the required duration to remove it from a machine after its processing. The objective is to minimize the maximum completion time (makespan). A heuristic and a lower bound are proposed for this NP-Hard problem. These procedures are based on the optimal solution of the parallel machine scheduling problem with release dates and delivery times. The heuristic is composed of two phases. The first one is a constructive phase in which an initial feasible solution is provided, while the second phase is an improvement one. Intensive computational experiments have been conducted to confirm the good performance of the proposed procedures.
Efficient Bounding Schemes for the Two-Center Hybrid Flow Shop Scheduling Problem with Removal Times
2014-01-01
We focus on the two-center hybrid flow shop scheduling problem with identical parallel machines and removal times. The job removal time is the required duration to remove it from a machine after its processing. The objective is to minimize the maximum completion time (makespan). A heuristic and a lower bound are proposed for this NP-Hard problem. These procedures are based on the optimal solution of the parallel machine scheduling problem with release dates and delivery times. The heuristic is composed of two phases. The first one is a constructive phase in which an initial feasible solution is provided, while the second phase is an improvement one. Intensive computational experiments have been conducted to confirm the good performance of the proposed procedures. PMID:25610911
On-Line Scheduling of Parallel Machines
1990-11-01
machine without losing any work; this is referred to as the preemptive model. In contrast to the nonpreemptive model which we have considered in this paper...that there exists no schedule of length d. The 2-relaxed decision procedure is as follows. Put each job into the queue of the slowest machine Mk such...in their queues . If a machine’s queue is empty it takes jobs to process from the queue of the first machine that is slower than it and that has a
NASA Astrophysics Data System (ADS)
Hsiao, Ming-Chih; Su, Ling-Huey
2018-02-01
This research addresses the problem of scheduling hybrid machine types, in which one type is a two-machine flowshop and another type is a single machine. A job is either processed on the two-machine flowshop or on the single machine. The objective is to determine a production schedule for all jobs so as to minimize the makespan. The problem is NP-hard since the two parallel machines problem was proved to be NP-hard. Simulated annealing algorithms are developed to solve the problem optimally. A mixed integer programming (MIP) is developed and used to evaluate the performance for two SAs. Computational experiments demonstrate the efficiency of the simulated annealing algorithms, the quality of the simulated annealing algorithms will also be reported.
NASA Astrophysics Data System (ADS)
Amallynda, I.; Santosa, B.
2017-11-01
This paper proposes a new generalization of the distributed parallel machine and assembly scheduling problem (DPMASP) with eligibility constraints referred to as the modified distributed parallel machine and assembly scheduling problem (MDPMASP) with eligibility constraints. Within this generalization, we assume that there are a set non-identical factories or production lines, each one with a set unrelated parallel machine with different speeds in processing them disposed to a single assembly machine in series. A set of different products that are manufactured through an assembly program of a set of components (jobs) according to the requested demand. Each product requires several kinds of jobs with different sizes. Beside that we also consider to the multi-objective problem (MOP) of minimizing mean flow time and the number of tardy products simultaneously. This is known to be NP-Hard problem, is important to practice, as the former criterions to reflect the customer's demand and manufacturer's perspective. This is a realistic and complex problem with wide range of possible solutions, we propose four simple heuristics and two metaheuristics to solve it. Various parameters of the proposed metaheuristic algorithms are discussed and calibrated by means of Taguchi technique. All proposed algorithms are tested by Matlab software. Our computational experiments indicate that the proposed problem and fourth proposed algorithms are able to be implemented and can be used to solve moderately-sized instances, and giving efficient solutions, which are close to optimum in most cases.
Characterization of robotics parallel algorithms and mapping onto a reconfigurable SIMD machine
NASA Technical Reports Server (NTRS)
Lee, C. S. G.; Lin, C. T.
1989-01-01
The kinematics, dynamics, Jacobian, and their corresponding inverse computations are six essential problems in the control of robot manipulators. Efficient parallel algorithms for these computations are discussed and analyzed. Their characteristics are identified and a scheme on the mapping of these algorithms to a reconfigurable parallel architecture is presented. Based on the characteristics including type of parallelism, degree of parallelism, uniformity of the operations, fundamental operations, data dependencies, and communication requirement, it is shown that most of the algorithms for robotic computations possess highly regular properties and some common structures, especially the linear recursive structure. Moreover, they are well-suited to be implemented on a single-instruction-stream multiple-data-stream (SIMD) computer with reconfigurable interconnection network. The model of a reconfigurable dual network SIMD machine with internal direct feedback is introduced. A systematic procedure internal direct feedback is introduced. A systematic procedure to map these computations to the proposed machine is presented. A new scheduling problem for SIMD machines is investigated and a heuristic algorithm, called neighborhood scheduling, that reorders the processing sequence of subtasks to reduce the communication time is described. Mapping results of a benchmark algorithm are illustrated and discussed.
Linux Kernel Co-Scheduling For Bulk Synchronous Parallel Applications
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jones, Terry R
2011-01-01
This paper describes a kernel scheduling algorithm that is based on co-scheduling principles and that is intended for parallel applications running on 1000 cores or more where inter-node scalability is key. Experimental results for a Linux implementation on a Cray XT5 machine are presented.1 The results indicate that Linux is a suitable operating system for this new scheduling scheme, and that this design provides a dramatic improvement in scaling performance for synchronizing collective operations at scale.
Linux Kernel Co-Scheduling and Bulk Synchronous Parallelism
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jones, Terry R
2012-01-01
This paper describes a kernel scheduling algorithm that is based on coscheduling principles and that is intended for parallel applications running on 1000 cores or more. Experimental results for a Linux implementation on a Cray XT5 machine are presented. The results indicate that Linux is a suitable operating system for this new scheduling scheme, and that this design provides a dramatic improvement in scaling performance for synchronizing collective operations at scale.
Constraint-Based Scheduling System
NASA Technical Reports Server (NTRS)
Zweben, Monte; Eskey, Megan; Stock, Todd; Taylor, Will; Kanefsky, Bob; Drascher, Ellen; Deale, Michael; Daun, Brian; Davis, Gene
1995-01-01
Report describes continuing development of software for constraint-based scheduling system implemented eventually on massively parallel computer. Based on machine learning as means of improving scheduling. Designed to learn when to change search strategy by analyzing search progress and learning general conditions under which resource bottleneck occurs.
Machine Learning Based Online Performance Prediction for Runtime Parallelization and Task Scheduling
DOE Office of Scientific and Technical Information (OSTI.GOV)
Li, J; Ma, X; Singh, K
2008-10-09
With the emerging many-core paradigm, parallel programming must extend beyond its traditional realm of scientific applications. Converting existing sequential applications as well as developing next-generation software requires assistance from hardware, compilers and runtime systems to exploit parallelism transparently within applications. These systems must decompose applications into tasks that can be executed in parallel and then schedule those tasks to minimize load imbalance. However, many systems lack a priori knowledge about the execution time of all tasks to perform effective load balancing with low scheduling overhead. In this paper, we approach this fundamental problem using machine learning techniques first to generatemore » performance models for all tasks and then applying those models to perform automatic performance prediction across program executions. We also extend an existing scheduling algorithm to use generated task cost estimates for online task partitioning and scheduling. We implement the above techniques in the pR framework, which transparently parallelizes scripts in the popular R language, and evaluate their performance and overhead with both a real-world application and a large number of synthetic representative test scripts. Our experimental results show that our proposed approach significantly improves task partitioning and scheduling, with maximum improvements of 21.8%, 40.3% and 22.1% and average improvements of 15.9%, 16.9% and 4.2% for LMM (a real R application) and synthetic test cases with independent and dependent tasks, respectively.« less
A hybrid dynamic harmony search algorithm for identical parallel machines scheduling
NASA Astrophysics Data System (ADS)
Chen, Jing; Pan, Quan-Ke; Wang, Ling; Li, Jun-Qing
2012-02-01
In this article, a dynamic harmony search (DHS) algorithm is proposed for the identical parallel machines scheduling problem with the objective to minimize makespan. First, an encoding scheme based on a list scheduling rule is developed to convert the continuous harmony vectors to discrete job assignments. Second, the whole harmony memory (HM) is divided into multiple small-sized sub-HMs, and each sub-HM performs evolution independently and exchanges information with others periodically by using a regrouping schedule. Third, a novel improvisation process is applied to generate a new harmony by making use of the information of harmony vectors in each sub-HM. Moreover, a local search strategy is presented and incorporated into the DHS algorithm to find promising solutions. Simulation results show that the hybrid DHS (DHS_LS) is very competitive in comparison to its competitors in terms of mean performance and average computational time.
NASA Astrophysics Data System (ADS)
Wang, Li-Chih; Chen, Yin-Yann; Chen, Tzu-Li; Cheng, Chen-Yang; Chang, Chin-Wei
2014-10-01
This paper studies a solar cell industry scheduling problem, which is similar to traditional hybrid flowshop scheduling (HFS). In a typical HFS problem, the allocation of machine resources for each order should be scheduled in advance. However, the challenge in solar cell manufacturing is the number of machines that can be adjusted dynamically to complete the job. An optimal production scheduling model is developed to explore these issues, considering the practical characteristics, such as hybrid flowshop, parallel machine system, dedicated machines, sequence independent job setup times and sequence dependent job setup times. The objective of this model is to minimise the makespan and to decide the processing sequence of the orders/lots in each stage, lot-splitting decisions for the orders and the number of machines used to satisfy the demands in each stage. From the experimental results, lot-splitting has significant effect on shortening the makespan, and the improvement effect is influenced by the processing time and the setup time of orders. Therefore, the threshold point to improve the makespan can be identified. In addition, the model also indicates that more lot-splitting approaches, that is, the flexibility of allocating orders/lots to machines is larger, will result in a better scheduling performance.
Eroglu, Duygu Yilmaz; Ozmutlu, H Cenk
2014-01-01
We developed mixed integer programming (MIP) models and hybrid genetic-local search algorithms for the scheduling problem of unrelated parallel machines with job sequence and machine-dependent setup times and with job splitting property. The first contribution of this paper is to introduce novel algorithms which make splitting and scheduling simultaneously with variable number of subjobs. We proposed simple chromosome structure which is constituted by random key numbers in hybrid genetic-local search algorithm (GAspLA). Random key numbers are used frequently in genetic algorithms, but it creates additional difficulty when hybrid factors in local search are implemented. We developed algorithms that satisfy the adaptation of results of local search into the genetic algorithms with minimum relocation operation of genes' random key numbers. This is the second contribution of the paper. The third contribution of this paper is three developed new MIP models which are making splitting and scheduling simultaneously. The fourth contribution of this paper is implementation of the GAspLAMIP. This implementation let us verify the optimality of GAspLA for the studied combinations. The proposed methods are tested on a set of problems taken from the literature and the results validate the effectiveness of the proposed algorithms.
Ozmutlu, H. Cenk
2014-01-01
We developed mixed integer programming (MIP) models and hybrid genetic-local search algorithms for the scheduling problem of unrelated parallel machines with job sequence and machine-dependent setup times and with job splitting property. The first contribution of this paper is to introduce novel algorithms which make splitting and scheduling simultaneously with variable number of subjobs. We proposed simple chromosome structure which is constituted by random key numbers in hybrid genetic-local search algorithm (GAspLA). Random key numbers are used frequently in genetic algorithms, but it creates additional difficulty when hybrid factors in local search are implemented. We developed algorithms that satisfy the adaptation of results of local search into the genetic algorithms with minimum relocation operation of genes' random key numbers. This is the second contribution of the paper. The third contribution of this paper is three developed new MIP models which are making splitting and scheduling simultaneously. The fourth contribution of this paper is implementation of the GAspLAMIP. This implementation let us verify the optimality of GAspLA for the studied combinations. The proposed methods are tested on a set of problems taken from the literature and the results validate the effectiveness of the proposed algorithms. PMID:24977204
Parallel-Batch Scheduling and Transportation Coordination with Waiting Time Constraint
Gong, Hua; Chen, Daheng; Xu, Ke
2014-01-01
This paper addresses a parallel-batch scheduling problem that incorporates transportation of raw materials or semifinished products before processing with waiting time constraint. The orders located at the different suppliers are transported by some vehicles to a manufacturing facility for further processing. One vehicle can load only one order in one shipment. Each order arriving at the facility must be processed in the limited waiting time. The orders are processed in batches on a parallel-batch machine, where a batch contains several orders and the processing time of the batch is the largest processing time of the orders in it. The goal is to find a schedule to minimize the sum of the total flow time and the production cost. We prove that the general problem is NP-hard in the strong sense. We also demonstrate that the problem with equal processing times on the machine is NP-hard. Furthermore, a dynamic programming algorithm in pseudopolynomial time is provided to prove its ordinarily NP-hardness. An optimal algorithm in polynomial time is presented to solve a special case with equal processing times and equal transportation times for each order. PMID:24883385
NASA Astrophysics Data System (ADS)
Setiawan, A.; Wangsaputra, R.; Martawirya, Y. Y.; Halim, A. H.
2016-02-01
This paper deals with Flexible Manufacturing System (FMS) production rescheduling due to unavailability of cutting tools caused either of cutting tool failure or life time limit. The FMS consists of parallel identical machines integrated with an automatic material handling system and it runs fully automatically. Each machine has a same cutting tool configuration that consists of different geometrical cutting tool types on each tool magazine. The job usually takes two stages. Each stage has sequential operations allocated to machines considering the cutting tool life. In the real situation, the cutting tool can fail before the cutting tool life is reached. The objective in this paper is to develop a dynamic scheduling algorithm when a cutting tool is broken during unmanned and a rescheduling needed. The algorithm consists of four steps. The first step is generating initial schedule, the second step is determination the cutting tool failure time, the third step is determination of system status at cutting tool failure time and the fourth step is the rescheduling for unfinished jobs. The approaches to solve the problem are complete-reactive scheduling and robust-proactive scheduling. The new schedules result differences starting time and completion time of each operations from the initial schedule.
Linux OS Jitter Measurements at Large Node Counts using a BlueGene/L
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jones, Terry R; Tauferner, Mr. Andrew; Inglett, Mr. Todd
2010-01-01
We present experimental results for a coordinated scheduling implementation of the Linux operating system. Results were collected on an IBM Blue Gene/L machine at scales up to 16K nodes. Our results indicate coordinated scheduling was able to provide a dramatic improvement in scaling performance for two applications characterized as bulk synchronous parallel programs.
Third Conference on Artificial Intelligence for Space Applications, part 1
NASA Technical Reports Server (NTRS)
Denton, Judith S. (Compiler); Freeman, Michael S. (Compiler); Vereen, Mary (Compiler)
1987-01-01
The application of artificial intelligence to spacecraft and aerospace systems is discussed. Expert systems, robotics, space station automation, fault diagnostics, parallel processing, knowledge representation, scheduling, man-machine interfaces and neural nets are among the topics discussed.
Single product lot-sizing on unrelated parallel machines with non-decreasing processing times
NASA Astrophysics Data System (ADS)
Eremeev, A.; Kovalyov, M.; Kuznetsov, P.
2018-01-01
We consider a problem in which at least a given quantity of a single product has to be partitioned into lots, and lots have to be assigned to unrelated parallel machines for processing. In one version of the problem, the maximum machine completion time should be minimized, in another version of the problem, the sum of machine completion times is to be minimized. Machine-dependent lower and upper bounds on the lot size are given. The product is either assumed to be continuously divisible or discrete. The processing time of each machine is defined by an increasing function of the lot volume, given as an oracle. Setup times and costs are assumed to be negligibly small, and therefore, they are not considered. We derive optimal polynomial time algorithms for several special cases of the problem. An NP-hard case is shown to admit a fully polynomial time approximation scheme. An application of the problem in energy efficient processors scheduling is considered.
NASA Astrophysics Data System (ADS)
Konno, Yohko; Suzuki, Keiji
This paper describes an approach to development of a solution algorithm of a general-purpose for large scale problems using “Local Clustering Organization (LCO)” as a new solution for Job-shop scheduling problem (JSP). Using a performance effective large scale scheduling in the study of usual LCO, a solving JSP keep stability induced better solution is examined. In this study for an improvement of a performance of a solution for JSP, processes to a optimization by LCO is examined, and a scheduling solution-structure is extended to a new solution-structure based on machine-division. A solving method introduced into effective local clustering for the solution-structure is proposed as an extended LCO. An extended LCO has an algorithm which improves scheduling evaluation efficiently by clustering of parallel search which extends over plural machines. A result verified by an application of extended LCO on various scale of problems proved to conduce to minimizing make-span and improving on the stable performance.
Meta-RaPS Algorithm for the Aerial Refueling Scheduling Problem
NASA Technical Reports Server (NTRS)
Kaplan, Sezgin; Arin, Arif; Rabadi, Ghaith
2011-01-01
The Aerial Refueling Scheduling Problem (ARSP) can be defined as determining the refueling completion times for each fighter aircraft (job) on multiple tankers (machines). ARSP assumes that jobs have different release times and due dates, The total weighted tardiness is used to evaluate schedule's quality. Therefore, ARSP can be modeled as a parallel machine scheduling with release limes and due dates to minimize the total weighted tardiness. Since ARSP is NP-hard, it will be more appropriate to develop a pproimate or heuristic algorithm to obtain solutions in reasonable computation limes. In this paper, Meta-Raps-ATC algorithm is implemented to create high quality solutions. Meta-RaPS (Meta-heuristic for Randomized Priority Search) is a recent and promising meta heuristic that is applied by introducing randomness to a construction heuristic. The Apparent Tardiness Rule (ATC), which is a good rule for scheduling problems with tardiness objective, is used to construct initial solutions which are improved by an exchanging operation. Results are presented for generated instances.
Heuristic for Critical Machine Based a Lot Streaming for Two-Stage Hybrid Production Environment
NASA Astrophysics Data System (ADS)
Vivek, P.; Saravanan, R.; Chandrasekaran, M.; Pugazhenthi, R.
2017-03-01
Lot streaming in Hybrid flowshop [HFS] is encountered in many real world problems. This paper deals with a heuristic approach for Lot streaming based on critical machine consideration for a two stage Hybrid Flowshop. The first stage has two identical parallel machines and the second stage has only one machine. In the second stage machine is considered as a critical by valid reasons these kind of problems is known as NP hard. A mathematical model developed for the selected problem. The simulation modelling and analysis were carried out in Extend V6 software. The heuristic developed for obtaining optimal lot streaming schedule. The eleven cases of lot streaming were considered. The proposed heuristic was verified and validated by real time simulation experiments. All possible lot streaming strategies and possible sequence under each lot streaming strategy were simulated and examined. The heuristic consistently yielded optimal schedule consistently in all eleven cases. The identification procedure for select best lot streaming strategy was suggested.
Buffered coscheduling for parallel programming and enhanced fault tolerance
Petrini, Fabrizio [Los Alamos, NM; Feng, Wu-chun [Los Alamos, NM
2006-01-31
A computer implemented method schedules processor jobs on a network of parallel machine processors or distributed system processors. Control information communications generated by each process performed by each processor during a defined time interval is accumulated in buffers, where adjacent time intervals are separated by strobe intervals for a global exchange of control information. A global exchange of the control information communications at the end of each defined time interval is performed during an intervening strobe interval so that each processor is informed by all of the other processors of the number of incoming jobs to be received by each processor in a subsequent time interval. The buffered coscheduling method of this invention also enhances the fault tolerance of a network of parallel machine processors or distributed system processors
Wave scheduling - Decentralized scheduling of task forces in multicomputers
NASA Technical Reports Server (NTRS)
Van Tilborg, A. M.; Wittie, L. D.
1984-01-01
Decentralized operating systems that control large multicomputers need techniques to schedule competing parallel programs called task forces. Wave scheduling is a probabilistic technique that uses a hierarchical distributed virtual machine to schedule task forces by recursively subdividing and issuing wavefront-like commands to processing elements capable of executing individual tasks. Wave scheduling is highly resistant to processing element failures because it uses many distributed schedulers that dynamically assign scheduling responsibilities among themselves. The scheduling technique is trivially extensible as more processing elements join the host multicomputer. A simple model of scheduling cost is used by every scheduler node to distribute scheduling activity and minimize wasted processing capacity by using perceived workload to vary decentralized scheduling rules. At low to moderate levels of network activity, wave scheduling is only slightly less efficient than a central scheduler in its ability to direct processing elements to accomplish useful work.
Toward Millions of File System IOPS on Low-Cost, Commodity Hardware
Zheng, Da; Burns, Randal; Szalay, Alexander S.
2013-01-01
We describe a storage system that removes I/O bottlenecks to achieve more than one million IOPS based on a user-space file abstraction for arrays of commodity SSDs. The file abstraction refactors I/O scheduling and placement for extreme parallelism and non-uniform memory and I/O. The system includes a set-associative, parallel page cache in the user space. We redesign page caching to eliminate CPU overhead and lock-contention in non-uniform memory architecture machines. We evaluate our design on a 32 core NUMA machine with four, eight-core processors. Experiments show that our design delivers 1.23 million 512-byte read IOPS. The page cache realizes the scalable IOPS of Linux asynchronous I/O (AIO) and increases user-perceived I/O performance linearly with cache hit rates. The parallel, set-associative cache matches the cache hit rates of the global Linux page cache under real workloads. PMID:24402052
Toward Millions of File System IOPS on Low-Cost, Commodity Hardware.
Zheng, Da; Burns, Randal; Szalay, Alexander S
2013-01-01
We describe a storage system that removes I/O bottlenecks to achieve more than one million IOPS based on a user-space file abstraction for arrays of commodity SSDs. The file abstraction refactors I/O scheduling and placement for extreme parallelism and non-uniform memory and I/O. The system includes a set-associative, parallel page cache in the user space. We redesign page caching to eliminate CPU overhead and lock-contention in non-uniform memory architecture machines. We evaluate our design on a 32 core NUMA machine with four, eight-core processors. Experiments show that our design delivers 1.23 million 512-byte read IOPS. The page cache realizes the scalable IOPS of Linux asynchronous I/O (AIO) and increases user-perceived I/O performance linearly with cache hit rates. The parallel, set-associative cache matches the cache hit rates of the global Linux page cache under real workloads.
Run-time scheduling and execution of loops on message passing machines
NASA Technical Reports Server (NTRS)
Crowley, Kay; Saltz, Joel; Mirchandaney, Ravi; Berryman, Harry
1989-01-01
Sparse system solvers and general purpose codes for solving partial differential equations are examples of the many types of problems whose irregularity can result in poor performance on distributed memory machines. Often, the data structures used in these problems are very flexible. Crucial details concerning loop dependences are encoded in these structures rather than being explicitly represented in the program. Good methods for parallelizing and partitioning these types of problems require assignment of computations in rather arbitrary ways. Naive implementations of programs on distributed memory machines requiring general loop partitions can be extremely inefficient. Instead, the scheduling mechanism needs to capture the data reference patterns of the loops in order to partition the problem. First, the indices assigned to each processor must be locally numbered. Next, it is necessary to precompute what information is needed by each processor at various points in the computation. The precomputed information is then used to generate an execution template designed to carry out the computation, communication, and partitioning of data, in an optimized manner. The design is presented for a general preprocessor and schedule executer, the structures of which do not vary, even though the details of the computation and of the type of information are problem dependent.
Run-time scheduling and execution of loops on message passing machines
NASA Technical Reports Server (NTRS)
Saltz, Joel; Crowley, Kathleen; Mirchandaney, Ravi; Berryman, Harry
1990-01-01
Sparse system solvers and general purpose codes for solving partial differential equations are examples of the many types of problems whose irregularity can result in poor performance on distributed memory machines. Often, the data structures used in these problems are very flexible. Crucial details concerning loop dependences are encoded in these structures rather than being explicitly represented in the program. Good methods for parallelizing and partitioning these types of problems require assignment of computations in rather arbitrary ways. Naive implementations of programs on distributed memory machines requiring general loop partitions can be extremely inefficient. Instead, the scheduling mechanism needs to capture the data reference patterns of the loops in order to partition the problem. First, the indices assigned to each processor must be locally numbered. Next, it is necessary to precompute what information is needed by each processor at various points in the computation. The precomputed information is then used to generate an execution template designed to carry out the computation, communication, and partitioning of data, in an optimized manner. The design is presented for a general preprocessor and schedule executer, the structures of which do not vary, even though the details of the computation and of the type of information are problem dependent.
NASA Astrophysics Data System (ADS)
Huang, J. D.; Liu, J. J.; Chen, Q. X.; Mao, N.
2017-06-01
Against a background of heat-treatment operations in mould manufacturing, a two-stage flow-shop scheduling problem is described for minimizing makespan with parallel batch-processing machines and re-entrant jobs. The weights and release dates of jobs are non-identical, but job processing times are equal. A mixed-integer linear programming model is developed and tested with small-scale scenarios. Given that the problem is NP hard, three heuristic construction methods with polynomial complexity are proposed. The worst case of the new constructive heuristic is analysed in detail. A method for computing lower bounds is proposed to test heuristic performance. Heuristic efficiency is tested with sets of scenarios. Compared with the two improved heuristics, the performance of the new constructive heuristic is superior.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lewis, M.; Grimshaw, A.
1996-12-31
The Legion project at the University of Virginia is an architecture for designing and building system services that provide the illusion of a single virtual machine to users, a virtual machine that provides secure shared object and shared name spaces, application adjustable fault-tolerance, improved response time, and greater throughput. Legion targets wide area assemblies of workstations, supercomputers, and parallel supercomputers, Legion tackles problems not solved by existing workstation based parallel processing tools; the system will enable fault-tolerance, wide area parallel processing, inter-operability, heterogeneity, a single global name space, protection, security, efficient scheduling, and comprehensive resource management. This paper describes themore » core Legion object model, which specifies the composition and functionality of Legion`s core objects-those objects that cooperate to create, locate, manage, and remove objects in the Legion system. The object model facilitates a flexible extensible implementation, provides a single global name space, grants site autonomy to participating organizations, and scales to millions of sites and trillions of objects.« less
Proteus: a reconfigurable computational network for computer vision
NASA Astrophysics Data System (ADS)
Haralick, Robert M.; Somani, Arun K.; Wittenbrink, Craig M.; Johnson, Robert; Cooper, Kenneth; Shapiro, Linda G.; Phillips, Ihsin T.; Hwang, Jenq N.; Cheung, William; Yao, Yung H.; Chen, Chung-Ho; Yang, Larry; Daugherty, Brian; Lorbeski, Bob; Loving, Kent; Miller, Tom; Parkins, Larye; Soos, Steven L.
1992-04-01
The Proteus architecture is a highly parallel MIMD, multiple instruction, multiple-data machine, optimized for large granularity tasks such as machine vision and image processing The system can achieve 20 Giga-flops (80 Giga-flops peak). It accepts data via multiple serial links at a rate of up to 640 megabytes/second. The system employs a hierarchical reconfigurable interconnection network with the highest level being a circuit switched Enhanced Hypercube serial interconnection network for internal data transfers. The system is designed to use 256 to 1,024 RISC processors. The processors use one megabyte external Read/Write Allocating Caches for reduced multiprocessor contention. The system detects, locates, and replaces faulty subsystems using redundant hardware to facilitate fault tolerance. The parallelism is directly controllable through an advanced software system for partitioning, scheduling, and development. System software includes a translator for the INSIGHT language, a parallel debugger, low and high level simulators, and a message passing system for all control needs. Image processing application software includes a variety of point operators neighborhood, operators, convolution, and the mathematical morphology operations of binary and gray scale dilation, erosion, opening, and closing.
NASA Astrophysics Data System (ADS)
Lary, D. J.
2013-12-01
A BigData case study is described where multiple datasets from several satellites, high-resolution global meteorological data, social media and in-situ observations are combined using machine learning on a distributed cluster using an automated workflow. The global particulate dataset is relevant to global public health studies and would not be possible to produce without the use of the multiple big datasets, in-situ data and machine learning.To greatly reduce the development time and enhance the functionality a high level language capable of parallel processing has been used (Matlab). A key consideration for the system is high speed access due to the large data volume, persistence of the large data volumes and a precise process time scheduling capability.
Code of Federal Regulations, 2011 CFR
2011-07-01
... machine cards not available from Federal Supply Schedule contracts. 101-26.509-2 Section 101-26.509-2... Programs § 101-26.509-2 Requisitioning tabulating machine cards not available from Federal Supply Schedule contracts. (a) Requisitions for tabulating machine cards covered by Federal Supply Schedule contracts which...
Multitasking the three-dimensional transport code TORT on CRAY platforms
DOE Office of Scientific and Technical Information (OSTI.GOV)
Azmy, Y.Y.; Barnett, D.A.; Burre, C.A.
1996-04-01
The multitasking options in the three-dimensional neutral particle transport code TORT originally implemented for Cray`s CTSS operating system are revived and extended to run on Cray Y/MP and C90 computers using the UNICOS operating system. These include two coarse-grained domain decompositions; across octants, and across directions within an octant, termed Octant Parallel (OP), and Direction Parallel (DP), respectively. Parallel performance of the DP is significantly enhanced by increasing the task grain size and reducing load imbalance via dynamic scheduling of the discrete angles among the participating tasks. Substantial Wall Clock speedup factors, approaching 4.5 using 8 tasks, have been measuredmore » in a time-sharing environment, and generally depend on the test problem specifications, number of tasks, and machine loading during execution.« less
Environmental concept for engineering software on MIMD computers
NASA Technical Reports Server (NTRS)
Lopez, L. A.; Valimohamed, K.
1989-01-01
The issues related to developing an environment in which engineering systems can be implemented on MIMD machines are discussed. The problem is presented in terms of implementing the finite element method under such an environment. However, neither the concepts nor the prototype implementation environment are limited to this application. The topics discussed include: the ability to schedule and synchronize tasks efficiently; granularity of tasks; load balancing; and the use of a high level language to specify parallel constructs, manage data, and achieve portability. The objective of developing a virtual machine concept which incorporates solutions to the above issues leads to a design that can be mapped onto loosely coupled, tightly coupled, and hybrid systems.
NASA Astrophysics Data System (ADS)
Sivarami Reddy, N.; Ramamurthy, D. V., Dr.; Prahlada Rao, K., Dr.
2017-08-01
This article addresses simultaneous scheduling of machines, AGVs and tools where machines are allowed to share the tools considering transfer times of jobs and tools between machines, to generate best optimal sequences that minimize makespan in a multi-machine Flexible Manufacturing System (FMS). Performance of FMS is expected to improve by effective utilization of its resources, by proper integration and synchronization of their scheduling. Symbiotic Organisms Search (SOS) algorithm is a potent tool which is a better alternative for solving optimization problems like scheduling and proven itself. The proposed SOS algorithm is tested on 22 job sets with makespan as objective for scheduling of machines and tools where machines are allowed to share tools without considering transfer times of jobs and tools and the results are compared with the results of existing methods. The results show that the SOS has outperformed. The same SOS algorithm is used for simultaneous scheduling of machines, AGVs and tools where machines are allowed to share tools considering transfer times of jobs and tools to determine the best optimal sequences that minimize makespan.
Exploring Machine Learning Techniques For Dynamic Modeling on Future Exascale Systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Song, Shuaiwen; Tallent, Nathan R.; Vishnu, Abhinav
2013-09-23
Future exascale systems must be optimized for both power and performance at scale in order to achieve DOE’s goal of a sustained petaflop within 20 Megawatts by 2022 [1]. Massive parallelism of the future systems combined with complex memory hierarchies will form a barrier to efficient application and architecture design. These challenges are exacerbated with emerging complex architectures such as GPGPUs and Intel Xeon Phi as parallelism increases orders of magnitude and system power consumption can easily triple or quadruple. Therefore, we need techniques that can reduce the search space for optimization, isolate power-performance bottlenecks, identify root causes for software/hardwaremore » inefficiency, and effectively direct runtime scheduling.« less
Job Management Requirements for NAS Parallel Systems and Clusters
NASA Technical Reports Server (NTRS)
Saphir, William; Tanner, Leigh Ann; Traversat, Bernard
1995-01-01
A job management system is a critical component of a production supercomputing environment, permitting oversubscribed resources to be shared fairly and efficiently. Job management systems that were originally designed for traditional vector supercomputers are not appropriate for the distributed-memory parallel supercomputers that are becoming increasingly important in the high performance computing industry. Newer job management systems offer new functionality but do not solve fundamental problems. We address some of the main issues in resource allocation and job scheduling we have encountered on two parallel computers - a 160-node IBM SP2 and a cluster of 20 high performance workstations located at the Numerical Aerodynamic Simulation facility. We describe the requirements for resource allocation and job management that are necessary to provide a production supercomputing environment on these machines, prioritizing according to difficulty and importance, and advocating a return to fundamental issues.
Expert Systems for the Scheduling of Image Processing Tasks on a Parallel Processing System
1986-12-01
existed for over twenty years. Credit for designing and implementing the first computer vision system is usually given to L. G . Roberts [Robe65]. With...hardware differences between systems. 44 LIST OF REFERENCES [Adam82] G . B. Adams III and H. J. Siegel, "The Extra Stage Cube: a Fault-Tolerant...Academic Press, 1985 [Robe65] L. G . Roberts, "Machine Perception of Three-Dimensional Solids," in Optical and Electro-Optical Information Processing, ed. J
NASA Astrophysics Data System (ADS)
Chang, Yung-Chia; Li, Vincent C.; Chiang, Chia-Ju
2014-04-01
Make-to-order or direct-order business models that require close interaction between production and distribution activities have been adopted by many enterprises in order to be competitive in demanding markets. This article considers an integrated production and distribution scheduling problem in which jobs are first processed by one of the unrelated parallel machines and then distributed to corresponding customers by capacitated vehicles without intermediate inventory. The objective is to find a joint production and distribution schedule so that the weighted sum of total weighted job delivery time and the total distribution cost is minimized. This article presents a mathematical model for describing the problem and designs an algorithm using ant colony optimization. Computational experiments illustrate that the algorithm developed is capable of generating near-optimal solutions. The computational results also demonstrate the value of integrating production and distribution in the model for the studied problem.
Exploiting Vector and Multicore Parallelsim for Recursive, Data- and Task-Parallel Programs
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ren, Bin; Krishnamoorthy, Sriram; Agrawal, Kunal
Modern hardware contains parallel execution resources that are well-suited for data-parallelism-vector units-and task parallelism-multicores. However, most work on parallel scheduling focuses on one type of hardware or the other. In this work, we present a scheduling framework that allows for a unified treatment of task- and data-parallelism. Our key insight is an abstraction, task blocks, that uniformly handles data-parallel iterations and task-parallel tasks, allowing them to be scheduled on vector units or executed independently as multicores. Our framework allows us to define schedulers that can dynamically select between executing task- blocks on vector units or multicores. We show that thesemore » schedulers are asymptotically optimal, and deliver the maximum amount of parallelism available in computation trees. To evaluate our schedulers, we develop program transformations that can convert mixed data- and task-parallel pro- grams into task block-based programs. Using a prototype instantiation of our scheduling framework, we show that, on an 8-core system, we can simultaneously exploit vector and multicore parallelism to achieve 14×-108× speedup over sequential baselines.« less
Balancing Contention and Synchronization on the Intel Paragon
NASA Technical Reports Server (NTRS)
Bokhari, Shahid H.; Nicol, David M.
1996-01-01
The Intel Paragon is a mesh-connected distributed memory parallel computer. It uses an oblivious and deterministic message routing algorithm: this permits us to develop highly optimized schedules for frequently needed communication patterns. The complete exchange is one such pattern. Several approaches are available for carrying it out on the mesh. We study an algorithm developed by Scott. This algorithm assumes that a communication link can carry one message at a time and that a node can only transmit one message at a time. It requires global synchronization to enforce a schedule of transmissions. Unfortunately global synchronization has substantial overhead on the Paragon. At the same time the powerful interconnection mechanism of this machine permits 2 or 3 messages to share a communication link with minor overhead. It can also overlap multiple message transmission from the same node to some extent. We develop a generalization of Scott's algorithm that executes complete exchange with a prescribed contention. Schedules that incur greater contention require fewer synchronization steps. This permits us to tradeoff contention against synchronization overhead. We describe the performance of this algorithm and compare it with Scott's original algorithm as well as with a naive algorithm that does not take interconnection structure into account. The Bounded contention algorithm is always better than Scott's algorithm and outperforms the naive algorithm for all but the smallest message sizes. The naive algorithm fails to work on meshes larger than 12 x 12. These results show that due consideration of processor interconnect and machine performance parameters is necessary to obtain peak performance from the Paragon and its successor mesh machines.
An Improved Hierarchical Genetic Algorithm for Sheet Cutting Scheduling with Process Constraints
Rao, Yunqing; Qi, Dezhong; Li, Jinling
2013-01-01
For the first time, an improved hierarchical genetic algorithm for sheet cutting problem which involves n cutting patterns for m non-identical parallel machines with process constraints has been proposed in the integrated cutting stock model. The objective of the cutting scheduling problem is minimizing the weighted completed time. A mathematical model for this problem is presented, an improved hierarchical genetic algorithm (ant colony—hierarchical genetic algorithm) is developed for better solution, and a hierarchical coding method is used based on the characteristics of the problem. Furthermore, to speed up convergence rates and resolve local convergence issues, a kind of adaptive crossover probability and mutation probability is used in this algorithm. The computational result and comparison prove that the presented approach is quite effective for the considered problem. PMID:24489491
An improved hierarchical genetic algorithm for sheet cutting scheduling with process constraints.
Rao, Yunqing; Qi, Dezhong; Li, Jinling
2013-01-01
For the first time, an improved hierarchical genetic algorithm for sheet cutting problem which involves n cutting patterns for m non-identical parallel machines with process constraints has been proposed in the integrated cutting stock model. The objective of the cutting scheduling problem is minimizing the weighted completed time. A mathematical model for this problem is presented, an improved hierarchical genetic algorithm (ant colony--hierarchical genetic algorithm) is developed for better solution, and a hierarchical coding method is used based on the characteristics of the problem. Furthermore, to speed up convergence rates and resolve local convergence issues, a kind of adaptive crossover probability and mutation probability is used in this algorithm. The computational result and comparison prove that the presented approach is quite effective for the considered problem.
DOE Office of Scientific and Technical Information (OSTI.GOV)
You, Yang; Song, Shuaiwen; Fu, Haohuan
2014-08-16
Support Vector Machine (SVM) has been widely used in data-mining and Big Data applications as modern commercial databases start to attach an increasing importance to the analytic capabilities. In recent years, SVM was adapted to the field of High Performance Computing for power/performance prediction, auto-tuning, and runtime scheduling. However, even at the risk of losing prediction accuracy due to insufficient runtime information, researchers can only afford to apply offline model training to avoid significant runtime training overhead. To address the challenges above, we designed and implemented MICSVM, a highly efficient parallel SVM for x86 based multi-core and many core architectures,more » such as the Intel Ivy Bridge CPUs and Intel Xeon Phi coprocessor (MIC).« less
Dynamically allocating sets of fine-grained processors to running computations
NASA Technical Reports Server (NTRS)
Middleton, David
1988-01-01
Researchers explore an approach to using general purpose parallel computers which involves mapping hardware resources onto computations instead of mapping computations onto hardware. Problems such as processor allocation, task scheduling and load balancing, which have traditionally proven to be challenging, change significantly under this approach and may become amenable to new attacks. Researchers describe the implementation of this approach used by the FFP Machine whose computation and communication resources are repeatedly partitioned into disjoint groups that match the needs of available tasks from moment to moment. Several consequences of this system are examined.
Debugging Fortran on a shared memory machine
DOE Office of Scientific and Technical Information (OSTI.GOV)
Allen, T.R.; Padua, D.A.
1987-01-01
Debugging on a parallel processor is more difficult than debugging on a serial machine because errors in a parallel program may introduce nondeterminism. The approach to parallel debugging presented here attempts to reduce the problem of debugging on a parallel machine to that of debugging on a serial machine by automatically detecting nondeterminism. 20 refs., 6 figs.
Parallelization of Lower-Upper Symmetric Gauss-Seidel Method for Chemically Reacting Flow
NASA Technical Reports Server (NTRS)
Yoon, Seokkwan; Jost, Gabriele; Chang, Sherry
2005-01-01
Development of technologies for exploration of the solar system has revived an interest in computational simulation of chemically reacting flows since planetary probe vehicles exhibit non-equilibrium phenomena during the atmospheric entry of a planet or a moon as well as the reentry to the Earth. Stability in combustion is essential for new propulsion systems. Numerical solution of real-gas flows often increases computational work by an order-of-magnitude compared to perfect gas flow partly because of the increased complexity of equations to solve. Recently, as part of Project Columbia, NASA has integrated a cluster of interconnected SGI Altix systems to provide a ten-fold increase in current supercomputing capacity that includes an SGI Origin system. Both the new and existing machines are based on cache coherent non-uniform memory access architecture. Lower-Upper Symmetric Gauss-Seidel (LU-SGS) relaxation method has been implemented into both perfect and real gas flow codes including Real-Gas Aerodynamic Simulator (RGAS). However, the vectorized RGAS code runs inefficiently on cache-based shared-memory machines such as SGI system. Parallelization of a Gauss-Seidel method is nontrivial due to its sequential nature. The LU-SGS method has been vectorized on an oblique plane in INS3D-LU code that has been one of the base codes for NAS Parallel benchmarks. The oblique plane has been called a hyperplane by computer scientists. It is straightforward to parallelize a Gauss-Seidel method by partitioning the hyperplanes once they are formed. Another way of parallelization is to schedule processors like a pipeline using software. Both hyperplane and pipeline methods have been implemented using openMP directives. The present paper reports the performance of the parallelized RGAS code on SGI Origin and Altix systems.
Compile-Time Partitioning and Scheduling of Parallel Programs. Extended Summary,
1986-01-01
OO-MI70 9PROGRAMS EXTENED, SUMNNRY(U) STANFORD, UNIV CA COMPUTERSYSTEMS LAO V SARKAR ET AL. L986 MDA9S3-SS-C-S432 UNCLASSIFIEDj F/ G 9/2 H El- 1 9 5...9 C M E h h h" E P RIIN N E O UI G O Fh E L i E Eu Iwle ui J l~I-O IWI INW 2-5 1= 13.111 2-2 l o U l1 . A 12- "m ’- - "- m°" m ’o ’ l ’. , " l...J. A. et al. "Parallel Processing: A Smart Compiler and a Dumb Machine". SIGPLAN Notices 19, 6 (June 1984). 8. Gajski , D. D., Padua, D. K. & Kuck, D
NASA Astrophysics Data System (ADS)
Buddala, Raviteja; Mahapatra, Siba Sankar
2017-11-01
Flexible flow shop (or a hybrid flow shop) scheduling problem is an extension of classical flow shop scheduling problem. In a simple flow shop configuration, a job having `g' operations is performed on `g' operation centres (stages) with each stage having only one machine. If any stage contains more than one machine for providing alternate processing facility, then the problem becomes a flexible flow shop problem (FFSP). FFSP which contains all the complexities involved in a simple flow shop and parallel machine scheduling problems is a well-known NP-hard (Non-deterministic polynomial time) problem. Owing to high computational complexity involved in solving these problems, it is not always possible to obtain an optimal solution in a reasonable computation time. To obtain near-optimal solutions in a reasonable computation time, a large variety of meta-heuristics have been proposed in the past. However, tuning algorithm-specific parameters for solving FFSP is rather tricky and time consuming. To address this limitation, teaching-learning-based optimization (TLBO) and JAYA algorithm are chosen for the study because these are not only recent meta-heuristics but they do not require tuning of algorithm-specific parameters. Although these algorithms seem to be elegant, they lose solution diversity after few iterations and get trapped at the local optima. To alleviate such drawback, a new local search procedure is proposed in this paper to improve the solution quality. Further, mutation strategy (inspired from genetic algorithm) is incorporated in the basic algorithm to maintain solution diversity in the population. Computational experiments have been conducted on standard benchmark problems to calculate makespan and computational time. It is found that the rate of convergence of TLBO is superior to JAYA. From the results, it is found that TLBO and JAYA outperform many algorithms reported in the literature and can be treated as efficient methods for solving the FFSP.
Two-machine flow shop scheduling integrated with preventive maintenance planning
NASA Astrophysics Data System (ADS)
Wang, Shijin; Liu, Ming
2016-02-01
This paper investigates an integrated optimisation problem of production scheduling and preventive maintenance (PM) in a two-machine flow shop with time to failure of each machine subject to a Weibull probability distribution. The objective is to find the optimal job sequence and the optimal PM decisions before each job such that the expected makespan is minimised. To investigate the value of integrated scheduling solution, computational experiments on small-scale problems with different configurations are conducted with total enumeration method, and the results are compared with those of scheduling without maintenance but with machine degradation, and individual job scheduling combined with independent PM planning. Then, for large-scale problems, four genetic algorithm (GA) based heuristics are proposed. The numerical results with several large problem sizes and different configurations indicate the potential benefits of integrated scheduling solution and the results also show that proposed GA-based heuristics are efficient for the integrated problem.
Scheduling job shop - A case study
NASA Astrophysics Data System (ADS)
Abas, M.; Abbas, A.; Khan, W. A.
2016-08-01
The scheduling in job shop is important for efficient utilization of machines in the manufacturing industry. There are number of algorithms available for scheduling of jobs which depend on machines tools, indirect consumables and jobs which are to be processed. In this paper a case study is presented for scheduling of jobs when parts are treated on available machines. Through time and motion study setup time and operation time are measured as total processing time for variety of products having different manufacturing processes. Based on due dates different level of priority are assigned to the jobs and the jobs are scheduled on the basis of priority. In view of the measured processing time, the times for processing of some new jobs are estimated and for efficient utilization of the machines available an algorithm is proposed and validated.
Implementation and performance of parallel Prolog interpreter
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wei, S.; Kale, L.V.; Balkrishna, R.
1988-01-01
In this paper, the authors discuss the implementation of a parallel Prolog interpreter on different parallel machines. The implementation is based on the REDUCE--OR process model which exploits both AND and OR parallelism in logic programs. It is machine independent as it runs on top of the chare-kernel--a machine-independent parallel programming system. The authors also give the performance of the interpreter running a diverse set of benchmark pargrams on parallel machines including shared memory systems: an Alliant FX/8, Sequent and a MultiMax, and a non-shared memory systems: Intel iPSC/32 hypercube, in addition to its performance on a multiprocessor simulation system.
Prediction based proactive thermal virtual machine scheduling in green clouds.
Kinger, Supriya; Kumar, Rajesh; Sharma, Anju
2014-01-01
Cloud computing has rapidly emerged as a widely accepted computing paradigm, but the research on Cloud computing is still at an early stage. Cloud computing provides many advanced features but it still has some shortcomings such as relatively high operating cost and environmental hazards like increasing carbon footprints. These hazards can be reduced up to some extent by efficient scheduling of Cloud resources. Working temperature on which a machine is currently running can be taken as a criterion for Virtual Machine (VM) scheduling. This paper proposes a new proactive technique that considers current and maximum threshold temperature of Server Machines (SMs) before making scheduling decisions with the help of a temperature predictor, so that maximum temperature is never reached. Different workload scenarios have been taken into consideration. The results obtained show that the proposed system is better than existing systems of VM scheduling, which does not consider current temperature of nodes before making scheduling decisions. Thus, a reduction in need of cooling systems for a Cloud environment has been obtained and validated.
Taming Wild Horses: The Need for Virtual Time-based Scheduling of VMs in Network Simulations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yoginath, Srikanth B; Perumalla, Kalyan S; Henz, Brian J
2012-01-01
The next generation of scalable network simulators employ virtual machines (VMs) to act as high-fidelity models of traffic producer/consumer nodes in simulated networks. However, network simulations could be inaccurate if VMs are not scheduled according to virtual time, especially when many VMs are hosted per simulator core in a multi-core simulator environment. Since VMs are by default free-running, on the outset, it is not clear if, and to what extent, their untamed execution affects the results in simulated scenarios. Here, we provide the first quantitative basis for establishing the need for generalized virtual time scheduling of VMs in network simulators,more » based on an actual prototyped implementations. To exercise breadth, our system is tested with multiple disparate applications: (a) a set of message passing parallel programs, (b) a computer worm propagation phenomenon, and (c) a mobile ad-hoc wireless network simulation. We define and use error metrics and benchmarks in scaled tests to empirically report the poor match of traditional, fairness-based VM scheduling to VM-based network simulation, and also clearly show the better performance of our simulation-specific scheduler, with up to 64 VMs hosted on a 12-core simulator node.« less
A general purpose subroutine for fast fourier transform on a distributed memory parallel machine
NASA Technical Reports Server (NTRS)
Dubey, A.; Zubair, M.; Grosch, C. E.
1992-01-01
One issue which is central in developing a general purpose Fast Fourier Transform (FFT) subroutine on a distributed memory parallel machine is the data distribution. It is possible that different users would like to use the FFT routine with different data distributions. Thus, there is a need to design FFT schemes on distributed memory parallel machines which can support a variety of data distributions. An FFT implementation on a distributed memory parallel machine which works for a number of data distributions commonly encountered in scientific applications is presented. The problem of rearranging the data after computing the FFT is also addressed. The performance of the implementation on a distributed memory parallel machine Intel iPSC/860 is evaluated.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Vydyanathan, Naga; Krishnamoorthy, Sriram; Sabin, Gerald M.
2009-08-01
Complex parallel applications can often be modeled as directed acyclic graphs of coarse-grained application-tasks with dependences. These applications exhibit both task- and data-parallelism, and combining these two (also called mixedparallelism), has been shown to be an effective model for their execution. In this paper, we present an algorithm to compute the appropriate mix of task- and data-parallelism required to minimize the parallel completion time (makespan) of these applications. In other words, our algorithm determines the set of tasks that should be run concurrently and the number of processors to be allocated to each task. The processor allocation and scheduling decisionsmore » are made in an integrated manner and are based on several factors such as the structure of the taskgraph, the runtime estimates and scalability characteristics of the tasks and the inter-task data communication volumes. A locality conscious scheduling strategy is used to improve inter-task data reuse. Evaluation through simulations and actual executions of task graphs derived from real applications as well as synthetic graphs shows that our algorithm consistently generates schedules with lower makespan as compared to CPR and CPA, two previously proposed scheduling algorithms. Our algorithm also produces schedules that have lower makespan than pure taskand data-parallel schedules. For task graphs with known optimal schedules or lower bounds on the makespan, our algorithm generates schedules that are closer to the optima than other scheduling approaches.« less
Observing with HST V: Improvements to the Scheduling of HST Parallel Observations
NASA Astrophysics Data System (ADS)
Taylor, D. K.; Vanorsow, D.; Lucks, M.; Henry, R.; Ratnatunga, K.; Patterson, A.
1994-12-01
Recent improvements to the Hubble Space Telescope (HST) ground system have significantly increased the frequency of pure parallel observations, i.e. the simultaneous use of multiple HST instruments by different observers. Opportunities for parallel observations are limited by a variety of timing, hardware, and scientific constraints. Formerly, such opportunities were heuristically predicted prior to the construction of the primary schedule (or calendar), and lack of complete information resulted in high rates of scheduling failures and missed opportunities. In the current process the search for parallel opportunities is delayed until the primary schedule is complete, at which point new software tools are employed to identify places where parallel observations are supported. The result has been a considerable increase in parallel throughput. A new technique, known as ``parallel crafting,'' is currently under development to streamline further the parallel scheduling process. This radically new method will replace the standard exposure logsheet with a set of abstract rules from which observation parameters will be constructed ``on the fly'' to best match the constraints of the parallel opportunity. Currently, parallel observers must specify a huge (and highly redundant) set of exposure types in order to cover all possible types of parallel opportunities. Crafting rules permit the observer to express timing, filter, and splitting preferences in a far more succinct manner. The issue of coordinated parallel observations (same PI using different instruments simultaneously), long a troublesome aspect of the ground system, is also being addressed. For Cycle 5, the Phase II Proposal Instructions now have an exposure-level PAR WITH special requirement. While only the primary's alignment will be scheduled on the calendar, new commanding will provide for parallel exposures with both instruments.
NASA Technical Reports Server (NTRS)
Shearrow, Charles A.
1999-01-01
One of the identified goals of EM3 is to implement virtual manufacturing by the time the year 2000 has ended. To realize this goal of a true virtual manufacturing enterprise the initial development of a machinability database and the infrastructure must be completed. This will consist of the containment of the existing EM-NET problems and developing machine, tooling, and common materials databases. To integrate the virtual manufacturing enterprise with normal day to day operations the development of a parallel virtual manufacturing machinability database, virtual manufacturing database, virtual manufacturing paradigm, implementation/integration procedure, and testable verification models must be constructed. Common and virtual machinability databases will include the four distinct areas of machine tools, available tooling, common machine tool loads, and a materials database. The machine tools database will include the machine envelope, special machine attachments, tooling capacity, location within NASA-JSC or with a contractor, and availability/scheduling. The tooling database will include available standard tooling, custom in-house tooling, tool properties, and availability. The common materials database will include materials thickness ranges, strengths, types, and their availability. The virtual manufacturing databases will consist of virtual machines and virtual tooling directly related to the common and machinability databases. The items to be completed are the design and construction of the machinability databases, virtual manufacturing paradigm for NASA-JSC, implementation timeline, VNC model of one bridge mill and troubleshoot existing software and hardware problems with EN4NET. The final step of this virtual manufacturing project will be to integrate other production sites into the databases bringing JSC's EM3 into a position of becoming a clearing house for NASA's digital manufacturing needs creating a true virtual manufacturing enterprise.
Prediction Based Proactive Thermal Virtual Machine Scheduling in Green Clouds
Kinger, Supriya; Kumar, Rajesh; Sharma, Anju
2014-01-01
Cloud computing has rapidly emerged as a widely accepted computing paradigm, but the research on Cloud computing is still at an early stage. Cloud computing provides many advanced features but it still has some shortcomings such as relatively high operating cost and environmental hazards like increasing carbon footprints. These hazards can be reduced up to some extent by efficient scheduling of Cloud resources. Working temperature on which a machine is currently running can be taken as a criterion for Virtual Machine (VM) scheduling. This paper proposes a new proactive technique that considers current and maximum threshold temperature of Server Machines (SMs) before making scheduling decisions with the help of a temperature predictor, so that maximum temperature is never reached. Different workload scenarios have been taken into consideration. The results obtained show that the proposed system is better than existing systems of VM scheduling, which does not consider current temperature of nodes before making scheduling decisions. Thus, a reduction in need of cooling systems for a Cloud environment has been obtained and validated. PMID:24737962
Parallel machine architecture and compiler design facilities
NASA Technical Reports Server (NTRS)
Kuck, David J.; Yew, Pen-Chung; Padua, David; Sameh, Ahmed; Veidenbaum, Alex
1990-01-01
The objective is to provide an integrated simulation environment for studying and evaluating various issues in designing parallel systems, including machine architectures, parallelizing compiler techniques, and parallel algorithms. The status of Delta project (which objective is to provide a facility to allow rapid prototyping of parallelized compilers that can target toward different machine architectures) is summarized. Included are the surveys of the program manipulation tools developed, the environmental software supporting Delta, and the compiler research projects in which Delta has played a role.
Locality Aware Concurrent Start for Stencil Applications
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shrestha, Sunil; Gao, Guang R.; Manzano Franco, Joseph B.
Stencil computations are at the heart of many physical simulations used in scientific codes. Thus, there exists a plethora of optimization efforts for this family of computations. Among these techniques, tiling techniques that allow concurrent start have proven to be very efficient in providing better performance for these critical kernels. Nevertheless, with many core designs being the norm, these optimization techniques might not be able to fully exploit locality (both spatial and temporal) on multiple levels of the memory hierarchy without compromising parallelism. It is no longer true that the machine can be seen as a homogeneous collection of nodesmore » with caches, main memory and an interconnect network. New architectural designs exhibit complex grouping of nodes, cores, threads, caches and memory connected by an ever evolving network-on-chip design. These new designs may benefit greatly from carefully crafted schedules and groupings that encourage parallel actors (i.e. threads, cores or nodes) to be aware of the computational history of other actors in close proximity. In this paper, we provide an efficient tiling technique that allows hierarchical concurrent start for memory hierarchy aware tile groups. Each execution schedule and tile shape exploit the available parallelism, load balance and locality present in the given applications. We demonstrate our technique on the Intel Xeon Phi architecture with selected and representative stencil kernels. We show improvement ranging from 5.58% to 31.17% over existing state-of-the-art techniques.« less
Advanced Numerical Techniques of Performance Evaluation. Volume 1
1990-06-01
system scheduling3thread. The scheduling thread then runs any other ready thread that can be found. A thread can only sleep or switch out on itself...Polychronopoulos and D.J. Kuck. Guided Self- Scheduling : A Practical Scheduling Scheme for Parallel Supercomputers. IEEE Transactions on Computers C...Kuck 1987] C.D. Polychronopoulos and D.J. Kuck. Guided Self- Scheduling : A Practical Scheduling Scheme for Parallel Supercomputers. IEEE Trans. on Comp
Proposed algorithm to improve job shop production scheduling using ant colony optimization method
NASA Astrophysics Data System (ADS)
Pakpahan, Eka KA; Kristina, Sonna; Setiawan, Ari
2017-12-01
This paper deals with the determination of job shop production schedule on an automatic environment. On this particular environment, machines and material handling system are integrated and controlled by a computer center where schedule were created and then used to dictate the movement of parts and the operations at each machine. This setting is usually designed to have an unmanned production process for a specified interval time. We consider here parts with various operations requirement. Each operation requires specific cutting tools. These parts are to be scheduled on machines each having identical capability, meaning that each machine is equipped with a similar set of cutting tools therefore is capable of processing any operation. The availability of a particular machine to process a particular operation is determined by the remaining life time of its cutting tools. We proposed an algorithm based on the ant colony optimization method and embedded them on matlab software to generate production schedule which minimize the total processing time of the parts (makespan). We test the algorithm on data provided by real industry and the process shows a very short computation time. This contributes a lot to the flexibility and timelines targeted on an automatic environment.
49 CFR 214.533 - Schedule of repairs subject to availability of parts.
Code of Federal Regulations, 2011 CFR
2011-10-01
... Maintenance Machines and Hi-Rail Vehicles § 214.533 Schedule of repairs subject to availability of parts. (a... 49 Transportation 4 2011-10-01 2011-10-01 false Schedule of repairs subject to availability of... maintenance machine or a hi-rail vehicle by the end of the next business day following the report of the...
49 CFR 214.533 - Schedule of repairs subject to availability of parts.
Code of Federal Regulations, 2010 CFR
2010-10-01
... Maintenance Machines and Hi-Rail Vehicles § 214.533 Schedule of repairs subject to availability of parts. (a... maintenance machine or a hi-rail vehicle by the end of the next business day following the report of the... maintenance machine or hi-rail vehicle within seven calendar days after receiving the necessary part. The...
Preconditioned implicit solvers for the Navier-Stokes equations on distributed-memory machines
NASA Technical Reports Server (NTRS)
Ajmani, Kumud; Liou, Meng-Sing; Dyson, Rodger W.
1994-01-01
The GMRES method is parallelized, and combined with local preconditioning to construct an implicit parallel solver to obtain steady-state solutions for the Navier-Stokes equations of fluid flow on distributed-memory machines. The new implicit parallel solver is designed to preserve the convergence rate of the equivalent 'serial' solver. A static domain-decomposition is used to partition the computational domain amongst the available processing nodes of the parallel machine. The SPMD (Single-Program Multiple-Data) programming model is combined with message-passing tools to develop the parallel code on a 32-node Intel Hypercube and a 512-node Intel Delta machine. The implicit parallel solver is validated for internal and external flow problems, and is found to compare identically with flow solutions obtained on a Cray Y-MP/8. A peak computational speed of 2300 MFlops/sec has been achieved on 512 nodes of the Intel Delta machine,k for a problem size of 1024 K equations (256 K grid points).
NASA Technical Reports Server (NTRS)
Moore, J. E.
1975-01-01
An enumeration algorithm is presented for solving a scheduling problem similar to the single machine job shop problem with sequence dependent setup times. The scheduling problem differs from the job shop problem in two ways. First, its objective is to select an optimum subset of the available tasks to be performed during a fixed period of time. Secondly, each task scheduled is constrained to occur within its particular scheduling window. The algorithm is currently being used to develop typical observational timelines for a telescope that will be operated in earth orbit. Computational times associated with timeline development are presented.
NASA Astrophysics Data System (ADS)
Wang, Liping; Jiang, Yao; Li, Tiemin
2014-09-01
Parallel kinematic machines have drawn considerable attention and have been widely used in some special fields. However, high precision is still one of the challenges when they are used for advanced machine tools. One of the main reasons is that the kinematic chains of parallel kinematic machines are composed of elongated links that can easily suffer deformations, especially at high speeds and under heavy loads. A 3-RRR parallel kinematic machine is taken as a study object for investigating its accuracy with the consideration of the deformations of its links during the motion process. Based on the dynamic model constructed by the Newton-Euler method, all the inertia loads and constraint forces of the links are computed and their deformations are derived. Then the kinematic errors of the machine are derived with the consideration of the deformations of the links. Through further derivation, the accuracy of the machine is given in a simple explicit expression, which will be helpful to increase the calculating speed. The accuracy of this machine when following a selected circle path is simulated. The influences of magnitude of the maximum acceleration and external loads on the running accuracy of the machine are investigated. The results show that the external loads will deteriorate the accuracy of the machine tremendously when their direction coincides with the direction of the worst stiffness of the machine. The proposed method provides a solution for predicting the running accuracy of the parallel kinematic machines and can also be used in their design optimization as well as selection of suitable running parameters.
Modelling machine ensembles with discrete event dynamical system theory
NASA Technical Reports Server (NTRS)
Hunter, Dan
1990-01-01
Discrete Event Dynamical System (DEDS) theory can be utilized as a control strategy for future complex machine ensembles that will be required for in-space construction. The control strategy involves orchestrating a set of interactive submachines to perform a set of tasks for a given set of constraints such as minimum time, minimum energy, or maximum machine utilization. Machine ensembles can be hierarchically modeled as a global model that combines the operations of the individual submachines. These submachines are represented in the global model as local models. Local models, from the perspective of DEDS theory , are described by the following: a set of system and transition states, an event alphabet that portrays actions that takes a submachine from one state to another, an initial system state, a partial function that maps the current state and event alphabet to the next state, and the time required for the event to occur. Each submachine in the machine ensemble is presented by a unique local model. The global model combines the local models such that the local models can operate in parallel under the additional logistic and physical constraints due to submachine interactions. The global model is constructed from the states, events, event functions, and timing requirements of the local models. Supervisory control can be implemented in the global model by various methods such as task scheduling (open-loop control) or implementing a feedback DEDS controller (closed-loop control).
Nadkarni, P M; Miller, P L
1991-01-01
A parallel program for inter-database sequence comparison was developed on the Intel Hypercube using two models of parallel programming. One version was built using machine-specific Hypercube parallel programming commands. The other version was built using Linda, a machine-independent parallel programming language. The two versions of the program provide a case study comparing these two approaches to parallelization in an important biological application area. Benchmark tests with both programs gave comparable results with a small number of processors. As the number of processors was increased, the Linda version was somewhat less efficient. The Linda version was also run without change on Network Linda, a virtual parallel machine running on a network of desktop workstations.
The Tera Multithreaded Architecture and Unstructured Meshes
NASA Technical Reports Server (NTRS)
Bokhari, Shahid H.; Mavriplis, Dimitri J.
1998-01-01
The Tera Multithreaded Architecture (MTA) is a new parallel supercomputer currently being installed at San Diego Supercomputing Center (SDSC). This machine has an architecture quite different from contemporary parallel machines. The computational processor is a custom design and the machine uses hardware to support very fine grained multithreading. The main memory is shared, hardware randomized and flat. These features make the machine highly suited to the execution of unstructured mesh problems, which are difficult to parallelize on other architectures. We report the results of a study carried out during July-August 1998 to evaluate the execution of EUL3D, a code that solves the Euler equations on an unstructured mesh, on the 2 processor Tera MTA at SDSC. Our investigation shows that parallelization of an unstructured code is extremely easy on the Tera. We were able to get an existing parallel code (designed for a shared memory machine), running on the Tera by changing only the compiler directives. Furthermore, a serial version of this code was compiled to run in parallel on the Tera by judicious use of directives to invoke the "full/empty" tag bits of the machine to obtain synchronization. This version achieves 212 and 406 Mflop/s on one and two processors respectively, and requires no attention to partitioning or placement of data issues that would be of paramount importance in other parallel architectures.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Castellana, Vito G.; Tumeo, Antonino; Ferrandi, Fabrizio
Emerging applications such as data mining, bioinformatics, knowledge discovery, social network analysis are irregular. They use data structures based on pointers or linked lists, such as graphs, unbalanced trees or unstructures grids, which generates unpredictable memory accesses. These data structures usually are large, but difficult to partition. These applications mostly are memory bandwidth bounded and have high synchronization intensity. However, they also have large amounts of inherent dynamic parallelism, because they potentially perform a task for each one of the element they are exploring. Several efforts are looking at accelerating these applications on hybrid architectures, which integrate general purpose processorsmore » with reconfigurable devices. Some solutions, which demonstrated significant speedups, include custom-hand tuned accelerators or even full processor architectures on the reconfigurable logic. In this paper we present an approach for the automatic synthesis of accelerators from C, targeted at irregular applications. In contrast to typical High Level Synthesis paradigms, which construct a centralized Finite State Machine, our approach generates dynamically scheduled hardware components. While parallelism exploitation in typical HLS-generated accelerators is usually bound within a single execution flow, our solution allows concurrently running multiple execution flow, thus also exploiting the coarser grain task parallelism of irregular applications. Our approach supports multiple, multi-ported and distributed memories, and atomic memory operations. Its main objective is parallelizing as many memory operations as possible, independently from their execution time, to maximize the memory bandwidth utilization. This significantly differs from current HLS flows, which usually consider a single memory port and require precise scheduling of memory operations. A key innovation of our approach is the generation of a memory interface controller, which dynamically maps concurrent memory accesses to multiple ports. We present a case study on a typical irregular kernel, Graph Breadth First search (BFS), exploring different tradeoffs in terms of parallelism and number of memories.« less
Sensor-scheduling simulation of disparate sensors for Space Situational Awareness
NASA Astrophysics Data System (ADS)
Hobson, T.; Clarkson, I.
2011-09-01
The art and science of space situational awareness (SSA) has been practised and developed from the time of Sputnik. However, recent developments, such as the accelerating pace of satellite launch, the proliferation of launch capable agencies, both commercial and sovereign, and recent well-publicised collisions involving man-made space objects, has further magnified the importance of timely and accurate SSA. The United States Strategic Command (USSTRATCOM) operates the Space Surveillance Network (SSN), a global network of sensors tasked with maintaining SSA. The rapidly increasing number of resident space objects will require commensurate improvements in the SSN. Sensors are scarce resources that must be scheduled judiciously to obtain measurements of maximum utility. Improvements in sensor scheduling and fusion, can serve to reduce the number of additional sensors that may be required. Recently, Hill et al. [1] have proposed and developed a simulation environment named TASMAN (Tasking Autonomous Sensors in a Multiple Application Network) to enable testing of alternative scheduling strategies within a simulated multi-sensor, multi-target environment. TASMAN simulates a high-fidelity, hardware-in-the-loop system by running multiple machines with different roles in parallel. At present, TASMAN is limited to simulations involving electro-optic sensors. Its high fidelity is at once a feature and a limitation, since supercomputing is required to run simulations of appreciable scale. In this paper, we describe an alternative, modular and scalable SSA simulation system that can extend the work of Hill et al with reduced complexity, albeit also with reduced fidelity. The tool has been developed in MATLAB and therefore can be run on a very wide range of computing platforms. It can also make use of MATLAB’s parallel processing capabilities to obtain considerable speed-up. The speed and flexibility so obtained can be used to quickly test scheduling algorithms even with a relatively large number of space objects. We further describe an application of the tool by exploring how the relative mixture of electro-optical and radar sensors can impact the scheduling, fusion and achievable accuracy of an SSA system. By varying the mixture of sensor types, we are able to characterise the main advantages and disadvantages of each configuration.
On program restructuring, scheduling, and communication for parallel processor systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Polychronopoulos, Constantine D.
1986-08-01
This dissertation discusses several software and hardware aspects of program execution on large-scale, high-performance parallel processor systems. The issues covered are program restructuring, partitioning, scheduling and interprocessor communication, synchronization, and hardware design issues of specialized units. All this work was performed focusing on a single goal: to maximize program speedup, or equivalently, to minimize parallel execution time. Parafrase, a Fortran restructuring compiler was used to transform programs in a parallel form and conduct experiments. Two new program restructuring techniques are presented, loop coalescing and subscript blocking. Compile-time and run-time scheduling schemes are covered extensively. Depending on the program construct, thesemore » algorithms generate optimal or near-optimal schedules. For the case of arbitrarily nested hybrid loops, two optimal scheduling algorithms for dynamic and static scheduling are presented. Simulation results are given for a new dynamic scheduling algorithm. The performance of this algorithm is compared to that of self-scheduling. Techniques for program partitioning and minimization of interprocessor communication for idealized program models and for real Fortran programs are also discussed. The close relationship between scheduling, interprocessor communication, and synchronization becomes apparent at several points in this work. Finally, the impact of various types of overhead on program speedup and experimental results are presented.« less
Reverse time migration: A seismic processing application on the connection machine
NASA Technical Reports Server (NTRS)
Fiebrich, Rolf-Dieter
1987-01-01
The implementation of a reverse time migration algorithm on the Connection Machine, a massively parallel computer is described. Essential architectural features of this machine as well as programming concepts are presented. The data structures and parallel operations for the implementation of the reverse time migration algorithm are described. The algorithm matches the Connection Machine architecture closely and executes almost at the peak performance of this machine.
Bidding-based autonomous process planning and scheduling
NASA Astrophysics Data System (ADS)
Gu, Peihua; Balasubramanian, Sivaram; Norrie, Douglas H.
1995-08-01
Improving productivity through computer integrated manufacturing systems (CIMS) and concurrent engineering requires that the islands of automation in an enterprise be completely integrated. The first step in this direction is to integrate design, process planning, and scheduling. This can be achieved through a bidding-based process planning approach. The product is represented in a STEP model with detailed design and administrative information including design specifications, batch size, and due dates. Upon arrival at the manufacturing facility, the product registered in the shop floor manager which is essentially a coordinating agent. The shop floor manager broadcasts the product's requirements to the machines. The shop contains autonomous machines that have knowledge about their functionality, capabilities, tooling, and schedule. Each machine has its own process planner and responds to the product's request in a different way that is consistent with its capabilities and capacities. When more than one machine offers certain process(es) for the same requirements, they enter into negotiation. Based on processing time, due date, and cost, one of the machines wins the contract. The successful machine updates its schedule and advises the product to request raw material for processing. The concept was implemented using a multi-agent system with the task decomposition and planning achieved through contract nets. The examples are included to illustrate the approach.
Nadkarni, P. M.; Miller, P. L.
1991-01-01
A parallel program for inter-database sequence comparison was developed on the Intel Hypercube using two models of parallel programming. One version was built using machine-specific Hypercube parallel programming commands. The other version was built using Linda, a machine-independent parallel programming language. The two versions of the program provide a case study comparing these two approaches to parallelization in an important biological application area. Benchmark tests with both programs gave comparable results with a small number of processors. As the number of processors was increased, the Linda version was somewhat less efficient. The Linda version was also run without change on Network Linda, a virtual parallel machine running on a network of desktop workstations. PMID:1807632
Line-drawing algorithms for parallel machines
NASA Technical Reports Server (NTRS)
Pang, Alex T.
1990-01-01
The fact that conventional line-drawing algorithms, when applied directly on parallel machines, can lead to very inefficient codes is addressed. It is suggested that instead of modifying an existing algorithm for a parallel machine, a more efficient implementation can be produced by going back to the invariants in the definition. Popular line-drawing algorithms are compared with two alternatives; distance to a line (a point is on the line if sufficiently close to it) and intersection with a line (a point on the line if an intersection point). For massively parallel single-instruction-multiple-data (SIMD) machines (with thousands of processors and up), the alternatives provide viable line-drawing algorithms. Because of the pixel-per-processor mapping, their performance is independent of the line length and orientation.
NASA Astrophysics Data System (ADS)
Paksi, A. B. N.; Ma'ruf, A.
2016-02-01
In general, both machines and human resources are needed for processing a job on production floor. However, most classical scheduling problems have ignored the possible constraint caused by availability of workers and have considered only machines as a limited resource. In addition, along with production technology development, routing flexibility appears as a consequence of high product variety and medium demand for each product. Routing flexibility is caused by capability of machines that offers more than one machining process. This paper presents a method to address scheduling problem constrained by both machines and workers, considering routing flexibility. Scheduling in a Dual-Resource Constrained shop is categorized as NP-hard problem that needs long computational time. Meta-heuristic approach, based on Genetic Algorithm, is used due to its practical implementation in industry. Developed Genetic Algorithm uses indirect chromosome representative and procedure to transform chromosome into Gantt chart. Genetic operators, namely selection, elitism, crossover, and mutation are developed to search the best fitness value until steady state condition is achieved. A case study in a manufacturing SME is used to minimize tardiness as objective function. The algorithm has shown 25.6% reduction of tardiness, equal to 43.5 hours.
Binary Trees and Parallel Scheduling Algorithms.
1980-09-01
been pro- cessed for p. time units. If a job does not complete by its due time, it is tardy. In a nonpreemptive schedule, job i is scheduled to process...the preemptive schedule obtained by the algorithm of section 2.1.2 also minimizes 5Ti, this problem is easily solved in parallel. When lci is to e...August 1978, pp. 657-661. 14. Horn, W. A., "Some simple scheduling algorithms," Naval Res. Logist . Qur., Vol. 21, pp. 177-185, 1974. i5. Hforowitz, E
Secure Autonomous Automated Scheduling (SAAS). Rev. 1.1
NASA Technical Reports Server (NTRS)
Walke, Jon G.; Dikeman, Larry; Sage, Stephen P.; Miller, Eric M.
2010-01-01
This report describes network-centric operations, where a virtual mission operations center autonomously receives sensor triggers, and schedules space and ground assets using Internet-based technologies and service-oriented architectures. For proof-of-concept purposes, sensor triggers are received from the United States Geological Survey (USGS) to determine targets for space-based sensors. The Surrey Satellite Technology Limited (SSTL) Disaster Monitoring Constellation satellite, the UK-DMC, is used as the space-based sensor. The UK-DMC's availability is determined via machine-to-machine communications using SSTL's mission planning system. Access to/from the UK-DMC for tasking and sensor data is via SSTL's and Universal Space Network's (USN) ground assets. The availability and scheduling of USN's assets can also be performed autonomously via machine-to-machine communications. All communication, both on the ground and between ground and space, uses open Internet standards
Scheduling algorithm for flow shop with two batch-processing machines and arbitrary job sizes
NASA Astrophysics Data System (ADS)
Cheng, Bayi; Yang, Shanlin; Hu, Xiaoxuan; Li, Kai
2014-03-01
This article considers the problem of scheduling two batch-processing machines in flow shop where the jobs have arbitrary sizes and the machines have limited capacity. The jobs are processed in batches and the total size of jobs in each batch cannot exceed the machine capacity. Once a batch is being processed, no interruption is allowed until all the jobs in it are completed. The problem of minimising makespan is NP-hard in the strong sense. First, we present a mathematical model of the problem using integer programme. We show the scale of feasible solutions of the problem and provide optimality properties. Then, we propose a polynomial time algorithm with running time in O(nlogn). The jobs are first assigned in feasible batches and then scheduled on machines. For the general case, we prove that the proposed algorithm has a performance guarantee of 4. For the special case where the processing times of each job on the two machines satisfy p 1 j = ap 2 j , the performance guarantee is ? for a > 0.
Manipulating Tabu List to Handle Machine Breakdowns in Job Shop Scheduling Problems
NASA Astrophysics Data System (ADS)
Nababan, Erna Budhiarti; SalimSitompul, Opim
2011-06-01
Machine breakdowns in a production schedule may occur on a random basis that make the well-known hard combinatorial problem of Job Shop Scheduling Problems (JSSP) becomes more complex. One of popular techniques used to solve the combinatorial problems is Tabu Search. In this technique, moves that will be not allowed to be revisited are retained in a tabu list in order to avoid in gaining solutions that have been obtained previously. In this paper, we propose an algorithm to employ a second tabu list to keep broken machines, in addition to the tabu list that keeps the moves. The period of how long the broken machines will be kept on the list is categorized using fuzzy membership function. Our technique are tested to the benchmark data of JSSP available on the OR library. From the experiment, we found that our algorithm is promising to help a decision maker to face the event of machine breakdowns.
Parallel Computational Fluid Dynamics: Current Status and Future Requirements
NASA Technical Reports Server (NTRS)
Simon, Horst D.; VanDalsem, William R.; Dagum, Leonardo; Kutler, Paul (Technical Monitor)
1994-01-01
One or the key objectives of the Applied Research Branch in the Numerical Aerodynamic Simulation (NAS) Systems Division at NASA Allies Research Center is the accelerated introduction of highly parallel machines into a full operational environment. In this report we discuss the performance results obtained from the implementation of some computational fluid dynamics (CFD) applications on the Connection Machine CM-2 and the Intel iPSC/860. We summarize some of the experiences made so far with the parallel testbed machines at the NAS Applied Research Branch. Then we discuss the long term computational requirements for accomplishing some of the grand challenge problems in computational aerosciences. We argue that only massively parallel machines will be able to meet these grand challenge requirements, and we outline the computer science and algorithm research challenges ahead.
A Solution Method of Scheduling Problem with Worker Allocation by a Genetic Algorithm
NASA Astrophysics Data System (ADS)
Osawa, Akira; Ida, Kenichi
In a scheduling problem with worker allocation (SPWA) proposed by Iima et al, the worker's skill level to each machine is all the same. However, each worker has a different skill level for each machine in the real world. For that reason, we propose a new model of SPWA in which a worker has the different skill level to each machine. To solve the problem, we propose a new GA for SPWA consisting of the following new three procedures, shortening of idle time, modifying infeasible solution to feasible solution, and a new selection method for GA. The effectiveness of the proposed algorithm is clarified by numerical experiments using benchmark problems for job-shop scheduling.
Research in Parallel Algorithms and Software for Computational Aerosciences
NASA Technical Reports Server (NTRS)
Domel, Neal D.
1996-01-01
Phase I is complete for the development of a Computational Fluid Dynamics parallel code with automatic grid generation and adaptation for the Euler analysis of flow over complex geometries. SPLITFLOW, an unstructured Cartesian grid code developed at Lockheed Martin Tactical Aircraft Systems, has been modified for a distributed memory/massively parallel computing environment. The parallel code is operational on an SGI network, Cray J90 and C90 vector machines, SGI Power Challenge, and Cray T3D and IBM SP2 massively parallel machines. Parallel Virtual Machine (PVM) is the message passing protocol for portability to various architectures. A domain decomposition technique was developed which enforces dynamic load balancing to improve solution speed and memory requirements. A host/node algorithm distributes the tasks. The solver parallelizes very well, and scales with the number of processors. Partially parallelized and non-parallelized tasks consume most of the wall clock time in a very fine grain environment. Timing comparisons on a Cray C90 demonstrate that Parallel SPLITFLOW runs 2.4 times faster on 8 processors than its non-parallel counterpart autotasked over 8 processors.
Research in Parallel Algorithms and Software for Computational Aerosciences
NASA Technical Reports Server (NTRS)
Domel, Neal D.
1996-01-01
Phase 1 is complete for the development of a computational fluid dynamics CFD) parallel code with automatic grid generation and adaptation for the Euler analysis of flow over complex geometries. SPLITFLOW, an unstructured Cartesian grid code developed at Lockheed Martin Tactical Aircraft Systems, has been modified for a distributed memory/massively parallel computing environment. The parallel code is operational on an SGI network, Cray J90 and C90 vector machines, SGI Power Challenge, and Cray T3D and IBM SP2 massively parallel machines. Parallel Virtual Machine (PVM) is the message passing protocol for portability to various architectures. A domain decomposition technique was developed which enforces dynamic load balancing to improve solution speed and memory requirements. A host/node algorithm distributes the tasks. The solver parallelizes very well, and scales with the number of processors. Partially parallelized and non-parallelized tasks consume most of the wall clock time in a very fine grain environment. Timing comparisons on a Cray C90 demonstrate that Parallel SPLITFLOW runs 2.4 times faster on 8 processors than its non-parallel counterpart autotasked over 8 processors.
Job shop scheduling problem with late work criterion
NASA Astrophysics Data System (ADS)
Piroozfard, Hamed; Wong, Kuan Yew
2015-05-01
Scheduling is considered as a key task in many industries, such as project based scheduling, crew scheduling, flight scheduling, machine scheduling, etc. In the machine scheduling area, the job shop scheduling problems are considered to be important and highly complex, in which they are characterized as NP-hard. The job shop scheduling problems with late work criterion and non-preemptive jobs are addressed in this paper. Late work criterion is a fairly new objective function. It is a qualitative measure and concerns with late parts of the jobs, unlike classical objective functions that are quantitative measures. In this work, simulated annealing was presented to solve the scheduling problem. In addition, operation based representation was used to encode the solution, and a neighbourhood search structure was employed to search for the new solutions. The case studies are Lawrence instances that were taken from the Operations Research Library. Computational results of this probabilistic meta-heuristic algorithm were compared with a conventional genetic algorithm, and a conclusion was made based on the algorithm and problem.
Quantum information, cognition, and music.
Dalla Chiara, Maria L; Giuntini, Roberto; Leporini, Roberto; Negri, Eleonora; Sergioli, Giuseppe
2015-01-01
Parallelism represents an essential aspect of human mind/brain activities. One can recognize some common features between psychological parallelism and the characteristic parallel structures that arise in quantum theory and in quantum computation. The article is devoted to a discussion of the following questions: a comparison between classical probabilistic Turing machines and quantum Turing machines.possible applications of the quantum computational semantics to cognitive problems.parallelism in music.
Quantum information, cognition, and music
Dalla Chiara, Maria L.; Giuntini, Roberto; Leporini, Roberto; Negri, Eleonora; Sergioli, Giuseppe
2015-01-01
Parallelism represents an essential aspect of human mind/brain activities. One can recognize some common features between psychological parallelism and the characteristic parallel structures that arise in quantum theory and in quantum computation. The article is devoted to a discussion of the following questions: a comparison between classical probabilistic Turing machines and quantum Turing machines.possible applications of the quantum computational semantics to cognitive problems.parallelism in music. PMID:26539139
NASA Astrophysics Data System (ADS)
Sembiring, N.; Nasution, A. H.
2018-02-01
Corrective maintenance i.e replacing or repairing the machine component after machine break down always done in a manufacturing company. It causes the production process must be stopped. Production time will decrease due to the maintenance team must replace or repair the damage machine component. This paper proposes a preventive maintenance’s schedule for a critical component of a critical machine of an crude palm oil and kernel company due to increase maintenance efficiency. The Reliability Engineering & Maintenance Value Stream Mapping is used as a method and a tool to analize the reliability of the component and reduce the wastage in any process by segregating value added and non value added activities.
Scheduling of flow shop problems on 3 machines in fuzzy environment with double transport facility
NASA Astrophysics Data System (ADS)
Sathish, Shakeela; Ganesan, K.
2016-06-01
Flow shop scheduling is a decision making problem in production and manufacturing field which has a significant impact on the performance of an organization. When the machines on which jobs are to be processed are placed at different places, the transportation time plays a significant role in production. Further two different transport agents where 1st takes the job from 1st machine to 2nd machine and then returns back to the first machine and the 2nd takes the job from 2nd machine to 3rd machine and then returns back to the 2nd machine are also considered. We propose a method to minimize the total make span; without converting the fuzzy processing time to classical numbers by using a new type of fuzzy arithmetic and a fuzzy ranking method. A numerical example is provided to explain the proposed method.
Characterizing and Mitigating Work Time Inflation in Task Parallel Programs
Olivier, Stephen L.; de Supinski, Bronis R.; Schulz, Martin; ...
2013-01-01
Task parallelism raises the level of abstraction in shared memory parallel programming to simplify the development of complex applications. However, task parallel applications can exhibit poor performance due to thread idleness, scheduling overheads, and work time inflation – additional time spent by threads in a multithreaded computation beyond the time required to perform the same work in a sequential computation. We identify the contributions of each factor to lost efficiency in various task parallel OpenMP applications and diagnose the causes of work time inflation in those applications. Increased data access latency can cause significant work time inflation in NUMA systems.more » Our locality framework for task parallel OpenMP programs mitigates this cause of work time inflation. Our extensions to the Qthreads library demonstrate that locality-aware scheduling can improve performance up to 3X compared to the Intel OpenMP task scheduler.« less
The paradigm compiler: Mapping a functional language for the connection machine
NASA Technical Reports Server (NTRS)
Dennis, Jack B.
1989-01-01
The Paradigm Compiler implements a new approach to compiling programs written in high level languages for execution on highly parallel computers. The general approach is to identify the principal data structures constructed by the program and to map these structures onto the processing elements of the target machine. The mapping is chosen to maximize performance as determined through compile time global analysis of the source program. The source language is Sisal, a functional language designed for scientific computations, and the target language is Paris, the published low level interface to the Connection Machine. The data structures considered are multidimensional arrays whose dimensions are known at compile time. Computations that build such arrays usually offer opportunities for highly parallel execution; they are data parallel. The Connection Machine is an attractive target for these computations, and the parallel for construct of the Sisal language is a convenient high level notation for data parallel algorithms. The principles and organization of the Paradigm Compiler are discussed.
An M-step preconditioned conjugate gradient method for parallel computation
NASA Technical Reports Server (NTRS)
Adams, L.
1983-01-01
This paper describes a preconditioned conjugate gradient method that can be effectively implemented on both vector machines and parallel arrays to solve sparse symmetric and positive definite systems of linear equations. The implementation on the CYBER 203/205 and on the Finite Element Machine is discussed and results obtained using the method on these machines are given.
Permutation flow-shop scheduling problem to optimize a quadratic objective function
NASA Astrophysics Data System (ADS)
Ren, Tao; Zhao, Peng; Zhang, Da; Liu, Bingqian; Yuan, Huawei; Bai, Danyu
2017-09-01
A flow-shop scheduling model enables appropriate sequencing for each job and for processing on a set of machines in compliance with identical processing orders. The objective is to achieve a feasible schedule for optimizing a given criterion. Permutation is a special setting of the model in which the processing order of the jobs on the machines is identical for each subsequent step of processing. This article addresses the permutation flow-shop scheduling problem to minimize the criterion of total weighted quadratic completion time. With a probability hypothesis, the asymptotic optimality of the weighted shortest processing time schedule under a consistency condition (WSPT-CC) is proven for sufficiently large-scale problems. However, the worst case performance ratio of the WSPT-CC schedule is the square of the number of machines in certain situations. A discrete differential evolution algorithm, where a new crossover method with multiple-point insertion is used to improve the final outcome, is presented to obtain high-quality solutions for moderate-scale problems. A sequence-independent lower bound is designed for pruning in a branch-and-bound algorithm for small-scale problems. A set of random experiments demonstrates the performance of the lower bound and the effectiveness of the proposed algorithms.
Job shop scheduling model for non-identic machine with fixed delivery time to minimize tardiness
NASA Astrophysics Data System (ADS)
Kusuma, K. K.; Maruf, A.
2016-02-01
Scheduling non-identic machines problem with low utilization characteristic and fixed delivery time are frequent in manufacture industry. This paper propose a mathematical model to minimize total tardiness for non-identic machines in job shop environment. This model will be categorized as an integer linier programming model and using branch and bound algorithm as the solver method. We will use fixed delivery time as main constraint and different processing time to process a job. The result of this proposed model shows that the utilization of production machines can be increase with minimal tardiness using fixed delivery time as constraint.
Learning dominance relations in combinatorial search problems
NASA Technical Reports Server (NTRS)
Yu, Chee-Fen; Wah, Benjamin W.
1988-01-01
Dominance relations commonly are used to prune unnecessary nodes in search graphs, but they are problem-dependent and cannot be derived by a general procedure. The authors identify machine learning of dominance relations and the applicable learning mechanisms. A study of learning dominance relations using learning by experimentation is described. This system has been able to learn dominance relations for the 0/1-knapsack problem, an inventory problem, the reliability-by-replication problem, the two-machine flow shop problem, a number of single-machine scheduling problems, and a two-machine scheduling problem. It is considered that the same methodology can be extended to learn dominance relations in general.
NASA Astrophysics Data System (ADS)
Zhadanovsky, Boris; Sinenko, Sergey
2018-03-01
Economic indicators of construction work, particularly in high-rise construction, are directly related to the choice of optimal number of machines. The shortage of machinery makes it impossible to complete the construction & installation work on scheduled time. Rates of performance of construction & installation works and labor productivity during high-rise construction largely depend on the degree of provision of construction project with machines (level of work mechanization). During calculation of the need for machines in construction projects, it is necessary to ensure that work is completed on scheduled time, increased level of complex mechanization, increased productivity and reduction of manual work, and improved usage and maintenance of machine fleet. The selection of machines and determination of their numbers should be carried out by using formulas presented in this work.
A survey of planning and scheduling research at the NASA Ames Research Center
NASA Technical Reports Server (NTRS)
Zweben, Monte
1989-01-01
NASA Ames Research Center has a diverse program in planning and scheduling. Some research projects as well as some applications are highlighted. Topics addressed include machine learning techniques, action representations and constraint-based scheduling systems. The applications discussed are planetary rovers, Hubble Space Telescope scheduling, and Pioneer Venus orbit scheduling.
NASA Astrophysics Data System (ADS)
Toporkov, D. M.; Vialcev, G. B.
2017-10-01
The implementation of parallel branches is a commonly used manufacturing method of the realizing of fractional slot concentrated windings in electrical machines. If the rotor eccentricity is enabled in a machine with parallel branches, the equalizing currents can arise. The simulation approach of the equalizing currents in parallel branches of an electrical machine winding based on magnetic field calculation by using Finite Elements Method is discussed in the paper. The high accuracy of the model is provided by the dynamic improvement of the inductances in the differential equation system describing a machine. The pre-computed table flux linkage functions are used for that. The functions are the dependences of the flux linkage of parallel branches on the branches currents and rotor position angle. The functions permit to calculate self-inductances and mutual inductances by partial derivative. The calculated results obtained for the electric machine specimen are presented. The results received show that the adverse combination of design solutions and the rotor eccentricity leads to a high value of the equalizing currents and windings heating. Additional torque ripples also arise. The additional ripples harmonic content is not similar to the cogging torque or ripples caused by the rotor eccentricity.
NAS Requirements Checklist for Job Queuing/Scheduling Software
NASA Technical Reports Server (NTRS)
Jones, James Patton
1996-01-01
The increasing reliability of parallel systems and clusters of computers has resulted in these systems becoming more attractive for true production workloads. Today, the primary obstacle to production use of clusters of computers is the lack of a functional and robust Job Management System for parallel applications. This document provides a checklist of NAS requirements for job queuing and scheduling in order to make most efficient use of parallel systems and clusters for parallel applications. Future requirements are also identified to assist software vendors with design planning.
RTNN: The New Parallel Machine in Zaragoza
NASA Astrophysics Data System (ADS)
Sijs, A. J. V. D.
I report on the development of RTNN, a parallel computer designed as a 4^4 hypercube of 256 T9000 transputer nodes, each with 8 MB memory. The peak performance of the machine is expected to be 2.5 Gflops.
Six Years of Parallel Computing at NAS (1987 - 1993): What Have we Learned?
NASA Technical Reports Server (NTRS)
Simon, Horst D.; Cooper, D. M. (Technical Monitor)
1994-01-01
In the fall of 1987 the age of parallelism at NAS began with the installation of a 32K processor CM-2 from Thinking Machines. In 1987 this was described as an "experiment" in parallel processing. In the six years since, NAS acquired a series of parallel machines, and conducted an active research and development effort focused on the use of highly parallel machines for applications in the computational aerosciences. In this time period parallel processing for scientific applications evolved from a fringe research topic into the one of main activities at NAS. In this presentation I will review the history of parallel computing at NAS in the context of the major progress, which has been made in the field in general. I will attempt to summarize the lessons we have learned so far, and the contributions NAS has made to the state of the art. Based on these insights I will comment on the current state of parallel computing (including the HPCC effort) and try to predict some trends for the next six years.
Scheduling for Locality in Shared-Memory Multiprocessors
1993-05-01
Submitted in Partial Fulfillment of the Requirements for the Degree ’)iIC Q(JALfryT INSPECTED 5 DOCTOR OF PHILOSOPHY I Accesion For Supervised by NTIS CRAM... architecture on parallel program performance, explain the implications of this trend on popular parallel programming models, and propose system software to 0...decomoosition and scheduling algorithms. I. SUIUECT TERMS IS. NUMBER OF PAGES shared-memory multiprocessors; architecture trends; loop 110 scheduling
NASA Astrophysics Data System (ADS)
Lu, Yuan-Yuan; Wang, Ji-Bo; Ji, Ping; He, Hongyu
2017-09-01
In this article, single-machine group scheduling with learning effects and convex resource allocation is studied. The goal is to find the optimal job schedule, the optimal group schedule, and resource allocations of jobs and groups. For the problem of minimizing the makespan subject to limited resource availability, it is proved that the problem can be solved in polynomial time under the condition that the setup times of groups are independent. For the general setup times of groups, a heuristic algorithm and a branch-and-bound algorithm are proposed, respectively. Computational experiments show that the performance of the heuristic algorithm is fairly accurate in obtaining near-optimal solutions.
Satellite antenna management system and method
NASA Technical Reports Server (NTRS)
Leath, Timothy T (Inventor); Azzolini, John D (Inventor)
1999-01-01
The antenna management system and method allow a satellite to communicate with a ground station either directly or by an intermediary of a second satellite, thus permitting communication even when the satellite is not within range of the ground station. The system and method employ five major software components, which are the control and initialization module, the command and telemetry handler module, the contact schedule processor module, the contact state machining module, and the telemetry state machine module. The control and initialization module initializes the system and operates the main control cycle, in which the other modules are called. The command and telemetry handler module handles communication to and from the ground station. The contact scheduler processor module handles the contact entry schedules to allow scheduling of contacts with the second satellite. The contact and telemetry state machine modules handle the various states of the satellite in beginning, maintaining and ending contact with the second satellite and in beginning, maintaining and ending communication with the satellite.
NASA Technical Reports Server (NTRS)
Quealy, Angela; Cole, Gary L.; Blech, Richard A.
1993-01-01
The Application Portable Parallel Library (APPL) is a subroutine-based library of communication primitives that is callable from applications written in FORTRAN or C. APPL provides a consistent programmer interface to a variety of distributed and shared-memory multiprocessor MIMD machines. The objective of APPL is to minimize the effort required to move parallel applications from one machine to another, or to a network of homogeneous machines. APPL encompasses many of the message-passing primitives that are currently available on commercial multiprocessor systems. This paper describes APPL (version 2.3.1) and its usage, reports the status of the APPL project, and indicates possible directions for the future. Several applications using APPL are discussed, as well as performance and overhead results.
Parallel-aware, dedicated job co-scheduling within/across symmetric multiprocessing nodes
Jones, Terry R.; Watson, Pythagoras C.; Tuel, William; Brenner, Larry; ,Caffrey, Patrick; Fier, Jeffrey
2010-10-05
In a parallel computing environment comprising a network of SMP nodes each having at least one processor, a parallel-aware co-scheduling method and system for improving the performance and scalability of a dedicated parallel job having synchronizing collective operations. The method and system uses a global co-scheduler and an operating system kernel dispatcher adapted to coordinate interfering system and daemon activities on a node and across nodes to promote intra-node and inter-node overlap of said interfering system and daemon activities as well as intra-node and inter-node overlap of said synchronizing collective operations. In this manner, the impact of random short-lived interruptions, such as timer-decrement processing and periodic daemon activity, on synchronizing collective operations is minimized on large processor-count SPMD bulk-synchronous programming styles.
Decentralized Control of Scheduling in Distributed Systems.
1983-03-18
the job scheduling algorithm adapts to the changing busyness of the various hosts in the system. The environment in which the job scheduling entities...resources and processes that constitute the node and a set of interfaces for accessing these processes and resources. The structure of a node could change ...parallel. Chang [CHNG82] has also described some algorithms for detecting properties of general graphs by traversing paths in a graph in parallel. One of
Optimization-based manufacturing scheduling with multiple resources and setup requirements
NASA Astrophysics Data System (ADS)
Chen, Dong; Luh, Peter B.; Thakur, Lakshman S.; Moreno, Jack, Jr.
1998-10-01
The increasing demand for on-time delivery and low price forces manufacturer to seek effective schedules to improve coordination of multiple resources and to reduce product internal costs associated with labor, setup and inventory. This study describes the design and implementation of a scheduling system for J. M. Product Inc. whose manufacturing is characterized by the need to simultaneously consider machines and operators while an operator may attend several operations at the same time, and the presence of machines requiring significant setup times. The scheduling problem with these characteristics are typical for many manufacturers, very difficult to be handled, and have not been adequately addressed in the literature. In this study, both machine and operators are modeled as resources with finite capacities to obtain efficient coordination between them, and an operator's time can be shared by several operations at the same time to make full use of the operator. Setups are explicitly modeled following our previous work, with additional penalties on excessive setups to reduce setup costs and avoid possible scraps. An integer formulation with a separable structure is developed to maximize on-time delivery of products, low inventory and small number of setups. Within the Lagrangian relaxation framework, the problem is decomposed into individual subproblems that are effectively solved by using dynamic programming with additional penalties embedded in state transitions. Heuristics is then developed to obtain a feasible schedule following on our previous work with new mechanism to satisfy operator capacity constraints. The method has been implemented using the object-oriented programming language C++ with a user-friendly interface, and numerical testing shows that the method generates high quality schedules in a timely fashion. Through simultaneous consideration of machines and operators, machines and operators are well coordinated to facilitate the smooth flow of parts through the system. The explicit modeling of setups and the associated penalties let parts with same setup requirements clustered together to avoid excessive setups.
Concurrent computation of attribute filters on shared memory parallel machines.
Wilkinson, Michael H F; Gao, Hui; Hesselink, Wim H; Jonker, Jan-Eppo; Meijster, Arnold
2008-10-01
Morphological attribute filters have not previously been parallelized, mainly because they are both global and non-separable. We propose a parallel algorithm that achieves efficient parallelism for a large class of attribute filters, including attribute openings, closings, thinnings and thickenings, based on Salembier's Max-Trees and Min-trees. The image or volume is first partitioned in multiple slices. We then compute the Max-trees of each slice using any sequential Max-Tree algorithm. Subsequently, the Max-trees of the slices can be merged to obtain the Max-tree of the image. A C-implementation yielded good speed-ups on both a 16-processor MIPS 14000 parallel machine, and a dual-core Opteron-based machine. It is shown that the speed-up of the parallel algorithm is a direct measure of the gain with respect to the sequential algorithm used. Furthermore, the concurrent algorithm shows a speed gain of up to 72 percent on a single-core processor, due to reduced cache thrashing.
A real-time MPEG software decoder using a portable message-passing library
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kwong, Man Kam; Tang, P.T. Peter; Lin, Biquan
1995-12-31
We present a real-time MPEG software decoder that uses message-passing libraries such as MPL, p4 and MPI. The parallel MPEG decoder currently runs on the IBM SP system but can be easil ported to other parallel machines. This paper discusses our parallel MPEG decoding algorithm as well as the parallel programming environment under which it uses. Several technical issues are discussed, including balancing of decoding speed, memory limitation, 1/0 capacities, and optimization of MPEG decoding components. This project shows that a real-time portable software MPEG decoder is feasible in a general-purpose parallel machine.
NASA Technical Reports Server (NTRS)
Kasahara, Hironori; Honda, Hiroki; Narita, Seinosuke
1989-01-01
Parallel processing of real-time dynamic systems simulation on a multiprocessor system named OSCAR is presented. In the simulation of dynamic systems, generally, the same calculation are repeated every time step. However, we cannot apply to Do-all or the Do-across techniques for parallel processing of the simulation since there exist data dependencies from the end of an iteration to the beginning of the next iteration and furthermore data-input and data-output are required every sampling time period. Therefore, parallelism inside the calculation required for a single time step, or a large basic block which consists of arithmetic assignment statements, must be used. In the proposed method, near fine grain tasks, each of which consists of one or more floating point operations, are generated to extract the parallelism from the calculation and assigned to processors by using optimal static scheduling at compile time in order to reduce large run time overhead caused by the use of near fine grain tasks. The practicality of the scheme is demonstrated on OSCAR (Optimally SCheduled Advanced multiprocessoR) which has been developed to extract advantageous features of static scheduling algorithms to the maximum extent.
A survey of planning and scheduling research at the NASA Ames Research Center
NASA Technical Reports Server (NTRS)
Zweben, Monte
1988-01-01
NASA Ames Research Center has a diverse program in planning and scheduling. This paper highlights some of our research projects as well as some of our applications. Topics addressed include machine learning techniques, action representations and constraint-based scheduling systems. The applications discussed are planetary rovers, Hubble Space Telescope scheduling, and Pioneer Venus orbit scheduling.
New Integrated Modeling Capabilities: MIDAS' Recent Behavioral Enhancements
NASA Technical Reports Server (NTRS)
Gore, Brian F.; Jarvis, Peter A.
2005-01-01
The Man-machine Integration Design and Analysis System (MIDAS) is an integrated human performance modeling software tool that is based on mechanisms that underlie and cause human behavior. A PC-Windows version of MIDAS has been created that integrates the anthropometric character "Jack (TM)" with MIDAS' validated perceptual and attention mechanisms. MIDAS now models multiple simulated humans engaging in goal-related behaviors. New capabilities include the ability to predict situations in which errors and/or performance decrements are likely due to a variety of factors including concurrent workload and performance influencing factors (PIFs). This paper describes a new model that predicts the effects of microgravity on a mission specialist's performance, and its first application to simulating the task of conducting a Life Sciences experiment in space according to a sequential or parallel schedule of performance.
Stochastic scheduling on a repairable manufacturing system
NASA Astrophysics Data System (ADS)
Li, Wei; Cao, Jinhua
1995-08-01
In this paper, we consider some stochastic scheduling problems with a set of stochastic jobs on a manufacturing system with a single machine that is subject to multiple breakdowns and repairs. When the machine processing a job fails, the job processing must restart some time later when the machine is repaired. For this typical manufacturing system, we find the optimal policies that minimize the following objective functions: (1) the weighed sum of the completion times; (2) the weighed number of late jobs having constant due dates; (3) the weighted number of late jobs having random due dates exponentially distributed, which generalize some previous results.
Solution of a tridiagonal system of equations on the finite element machine
NASA Technical Reports Server (NTRS)
Bostic, S. W.
1984-01-01
Two parallel algorithms for the solution of tridiagonal systems of equations were implemented on the Finite Element Machine. The Accelerated Parallel Gauss method, an iterative method, and the Buneman algorithm, a direct method, are discussed and execution statistics are presented.
21 CFR 1310.16 - Exemptions for certain scheduled listed chemical products.
Code of Federal Regulations, 2011 CFR
2011-04-01
... 21 Food and Drugs 9 2011-04-01 2011-04-01 false Exemptions for certain scheduled listed chemical... RECORDS AND REPORTS OF LISTED CHEMICALS AND CERTAIN MACHINES § 1310.16 Exemptions for certain scheduled listed chemical products. (a) Upon the application of a manufacturer of a scheduled listed chemical...
Implementation of a parallel unstructured Euler solver on the CM-5
NASA Technical Reports Server (NTRS)
Morano, Eric; Mavriplis, D. J.
1995-01-01
An efficient unstructured 3D Euler solver is parallelized on a Thinking Machine Corporation Connection Machine 5, distributed memory computer with vectoring capability. In this paper, the single instruction multiple data (SIMD) strategy is employed through the use of the CM Fortran language and the CMSSL scientific library. The performance of the CMSSL mesh partitioner is evaluated and the overall efficiency of the parallel flow solver is discussed.
Scheduling Jobs and a Variable Maintenance on a Single Machine with Common Due-Date Assignment
Wan, Long
2014-01-01
We investigate a common due-date assignment scheduling problem with a variable maintenance on a single machine. The goal is to minimize the total earliness, tardiness, and due-date cost. We derive some properties on an optimal solution for our problem. For a special case with identical jobs we propose an optimal polynomial time algorithm followed by a numerical example. PMID:25147861
Performance of a plasma fluid code on the Intel parallel computers
NASA Technical Reports Server (NTRS)
Lynch, V. E.; Carreras, B. A.; Drake, J. B.; Leboeuf, J. N.; Liewer, P.
1992-01-01
One approach to improving the real-time efficiency of plasma turbulence calculations is to use a parallel algorithm. A parallel algorithm for plasma turbulence calculations was tested on the Intel iPSC/860 hypercube and the Touchtone Delta machine. Using the 128 processors of the Intel iPSC/860 hypercube, a factor of 5 improvement over a single-processor CRAY-2 is obtained. For the Touchtone Delta machine, the corresponding improvement factor is 16. For plasma edge turbulence calculations, an extrapolation of the present results to the Intel (sigma) machine gives an improvement factor close to 64 over the single-processor CRAY-2.
On the suitability of the connection machine for direct particle simulation
NASA Technical Reports Server (NTRS)
Dagum, Leonard
1990-01-01
The algorithmic structure was examined of the vectorizable Stanford particle simulation (SPS) method and the structure is reformulated in data parallel form. Some of the SPS algorithms can be directly translated to data parallel, but several of the vectorizable algorithms have no direct data parallel equivalent. This requires the development of new, strictly data parallel algorithms. In particular, a new sorting algorithm is developed to identify collision candidates in the simulation and a master/slave algorithm is developed to minimize communication cost in large table look up. Validation of the method is undertaken through test calculations for thermal relaxation of a gas, shock wave profiles, and shock reflection from a stationary wall. A qualitative measure is provided of the performance of the Connection Machine for direct particle simulation. The massively parallel architecture of the Connection Machine is found quite suitable for this type of calculation. However, there are difficulties in taking full advantage of this architecture because of lack of a broad based tradition of data parallel programming. An important outcome of this work has been new data parallel algorithms specifically of use for direct particle simulation but which also expand the data parallel diction.
32 CFR 701.53 - FOIA fee schedule.
Code of Federal Regulations, 2014 CFR
2014-07-01
... human time) and machine time. (1) Human time. Human time is all the time spent by humans performing the...) Machine time. Machine time involves only direct costs of the central processing unit (CPU), input/output... exist to calculate CPU time, no machine costs can be passed on to the requester. When CPU calculations...
32 CFR 701.53 - FOIA fee schedule.
Code of Federal Regulations, 2012 CFR
2012-07-01
... human time) and machine time. (1) Human time. Human time is all the time spent by humans performing the...) Machine time. Machine time involves only direct costs of the central processing unit (CPU), input/output... exist to calculate CPU time, no machine costs can be passed on to the requester. When CPU calculations...
32 CFR 701.53 - FOIA fee schedule.
Code of Federal Regulations, 2013 CFR
2013-07-01
... human time) and machine time. (1) Human time. Human time is all the time spent by humans performing the...) Machine time. Machine time involves only direct costs of the central processing unit (CPU), input/output... exist to calculate CPU time, no machine costs can be passed on to the requester. When CPU calculations...
NASA Technical Reports Server (NTRS)
Saini, Subash; Bailey, David; Chancellor, Marisa K. (Technical Monitor)
1997-01-01
High Performance Fortran (HPF), the high-level language for parallel Fortran programming, is based on Fortran 90. HALF was defined by an informal standards committee known as the High Performance Fortran Forum (HPFF) in 1993, and modeled on TMC's CM Fortran language. Several HPF features have since been incorporated into the draft ANSI/ISO Fortran 95, the next formal revision of the Fortran standard. HPF allows users to write a single parallel program that can execute on a serial machine, a shared-memory parallel machine, or a distributed-memory parallel machine. HPF eliminates the complex, error-prone task of explicitly specifying how, where, and when to pass messages between processors on distributed-memory machines, or when to synchronize processors on shared-memory machines. HPF is designed in a way that allows the programmer to code an application at a high level, and then selectively optimize portions of the code by dropping into message-passing or calling tuned library routines as 'extrinsics'. Compilers supporting High Performance Fortran features first appeared in late 1994 and early 1995 from Applied Parallel Research (APR) Digital Equipment Corporation, and The Portland Group (PGI). IBM introduced an HPF compiler for the IBM RS/6000 SP/2 in April of 1996. Over the past two years, these implementations have shown steady improvement in terms of both features and performance. The performance of various hardware/ programming model (HPF and MPI (message passing interface)) combinations will be compared, based on latest NAS (NASA Advanced Supercomputing) Parallel Benchmark (NPB) results, thus providing a cross-machine and cross-model comparison. Specifically, HPF based NPB results will be compared with MPI based NPB results to provide perspective on performance currently obtainable using HPF versus MPI or versus hand-tuned implementations such as those supplied by the hardware vendors. In addition we would also present NPB (Version 1.0) performance results for the following systems: DEC Alpha Server 8400 5/440, Fujitsu VPP Series (VX, VPP300, and VPP700), HP/Convex Exemplar SPP2000, IBM RS/6000 SP P2SC node (120 MHz) NEC SX-4/32, SGI/CRAY T3E, SGI Origin2000.
The FORCE - A highly portable parallel programming language
NASA Technical Reports Server (NTRS)
Jordan, Harry F.; Benten, Muhammad S.; Alaghband, Gita; Jakob, Ruediger
1989-01-01
This paper explains why the FORCE parallel programming language is easily portable among six different shared-memory multiprocessors, and how a two-level macro preprocessor makes it possible to hide low-level machine dependencies and to build machine-independent high-level constructs on top of them. These FORCE constructs make it possible to write portable parallel programs largely independent of the number of processes and the specific shared-memory multiprocessor executing them.
The FORCE: A highly portable parallel programming language
NASA Technical Reports Server (NTRS)
Jordan, Harry F.; Benten, Muhammad S.; Alaghband, Gita; Jakob, Ruediger
1989-01-01
Here, it is explained why the FORCE parallel programming language is easily portable among six different shared-memory microprocessors, and how a two-level macro preprocessor makes it possible to hide low level machine dependencies and to build machine-independent high level constructs on top of them. These FORCE constructs make it possible to write portable parallel programs largely independent of the number of processes and the specific shared memory multiprocessor executing them.
Car painting process scheduling with harmony search algorithm
NASA Astrophysics Data System (ADS)
Syahputra, M. F.; Maiyasya, A.; Purnamawati, S.; Abdullah, D.; Albra, W.; Heikal, M.; Abdurrahman, A.; Khaddafi, M.
2018-02-01
Automotive painting program in the process of painting the car body by using robot power, making efficiency in the production system. Production system will be more efficient if pay attention to scheduling of car order which will be done by considering painting body shape of car. Flow shop scheduling is a scheduling model in which the job-job to be processed entirely flows in the same product direction / path. Scheduling problems often arise if there are n jobs to be processed on the machine, which must be specified which must be done first and how to allocate jobs on the machine to obtain a scheduled production process. Harmony Search Algorithm is a metaheuristic optimization algorithm based on music. The algorithm is inspired by observations that lead to music in search of perfect harmony. This musical harmony is in line to find optimal in the optimization process. Based on the tests that have been done, obtained the optimal car sequence with minimum makespan value.
Experiments with a Parallel Multi-Objective Evolutionary Algorithm for Scheduling
NASA Technical Reports Server (NTRS)
Brown, Matthew; Johnston, Mark D.
2013-01-01
Evolutionary multi-objective algorithms have great potential for scheduling in those situations where tradeoffs among competing objectives represent a key requirement. One challenge, however, is runtime performance, as a consequence of evolving not just a single schedule, but an entire population, while attempting to sample the Pareto frontier as accurately and uniformly as possible. The growing availability of multi-core processors in end user workstations, and even laptops, has raised the question of the extent to which such hardware can be used to speed up evolutionary algorithms. In this paper we report on early experiments in parallelizing a Generalized Differential Evolution (GDE) algorithm for scheduling long-range activities on NASA's Deep Space Network. Initial results show that significant speedups can be achieved, but that performance does not necessarily improve as more cores are utilized. We describe our preliminary results and some initial suggestions from parallelizing the GDE algorithm. Directions for future work are outlined.
Code of Federal Regulations, 2011 CFR
2011-07-01
... approved, accepted or certified under Bureau of Mines Schedule 2D, 2E, 2F, or 2G. 18.95 Section 18.95..., accepted or certified under Bureau of Mines Schedule 2D, 2E, 2F, or 2G. Machines for which field approval... 2D, 2E, 2F, or 2G, shall be approved following a determination by the electrical representative that...
Code of Federal Regulations, 2013 CFR
2013-07-01
... approved, accepted or certified under Bureau of Mines Schedule 2D, 2E, 2F, or 2G. 18.95 Section 18.95..., accepted or certified under Bureau of Mines Schedule 2D, 2E, 2F, or 2G. Machines for which field approval... 2D, 2E, 2F, or 2G, shall be approved following a determination by the electrical representative that...
Code of Federal Regulations, 2010 CFR
2010-07-01
... approved, accepted or certified under Bureau of Mines Schedule 2D, 2E, 2F, or 2G. 18.95 Section 18.95..., accepted or certified under Bureau of Mines Schedule 2D, 2E, 2F, or 2G. Machines for which field approval... 2D, 2E, 2F, or 2G, shall be approved following a determination by the electrical representative that...
Code of Federal Regulations, 2014 CFR
2014-07-01
... approved, accepted or certified under Bureau of Mines Schedule 2D, 2E, 2F, or 2G. 18.95 Section 18.95..., accepted or certified under Bureau of Mines Schedule 2D, 2E, 2F, or 2G. Machines for which field approval... 2D, 2E, 2F, or 2G, shall be approved following a determination by the electrical representative that...
Code of Federal Regulations, 2012 CFR
2012-07-01
... approved, accepted or certified under Bureau of Mines Schedule 2D, 2E, 2F, or 2G. 18.95 Section 18.95..., accepted or certified under Bureau of Mines Schedule 2D, 2E, 2F, or 2G. Machines for which field approval... 2D, 2E, 2F, or 2G, shall be approved following a determination by the electrical representative that...
Study on Parallel 2-DOF Rotation Machanism in Radar
NASA Astrophysics Data System (ADS)
Jiang, Ming; Hu, Xuelong; Liu, Lei; Yu, Yunfei
The spherical parallel machine has become the world's academic and industrial focus of the field in recent years due to its simple and economical manufacture as well as its structural compactness especially suitable for areas where space gesture changes. This paper dwells upon its present research and development home and abroad. The newer machine (RGRR-II) can rotate around the axis z within 360° and the axis y1 from -90° to +90°. It has the advantages such as less moving parts (only 3 parts), larger ratio of work space to machine size, zero mechanic coupling, no singularity. Constructing rotation machine with spherical parallel 2-DOF rotation join (RGRR-II) may realize semispherical movement with zero dead point and extent the range. Control card (PA8000NT Series CNC) is installed in the computer. The card can run the corresponding software which realizes radar movement control. The machine meets the need of radars in plane and satellite which require larger detection range, lighter weight and compacter structure.
Paging memory from random access memory to backing storage in a parallel computer
Archer, Charles J; Blocksome, Michael A; Inglett, Todd A; Ratterman, Joseph D; Smith, Brian E
2013-05-21
Paging memory from random access memory (`RAM`) to backing storage in a parallel computer that includes a plurality of compute nodes, including: executing a data processing application on a virtual machine operating system in a virtual machine on a first compute node; providing, by a second compute node, backing storage for the contents of RAM on the first compute node; and swapping, by the virtual machine operating system in the virtual machine on the first compute node, a page of memory from RAM on the first compute node to the backing storage on the second compute node.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Li, Ang; Song, Shuaiwen; Brugel, Eric
To continuously comply with Moore’s Law, modern parallel machines become increasingly complex. Effectively tuning application performance for these machines therefore becomes a daunting task. Moreover, identifying performance bottlenecks at application and architecture level, as well as evaluating various optimization strategies, are becoming extremely difficult when the entanglement of numerous correlated factors is being presented. To tackle these challenges, we present a visual analytical model named “X”. It is intuitive and sufficiently flexible to track all the typical features of a parallel machine.
Incentive Compatible Online Scheduling of Malleable Parallel Jobs with Individual Deadlines
DOE Office of Scientific and Technical Information (OSTI.GOV)
Carroll, Thomas E.; Grosu, Daniel
2010-09-13
We consider the online scheduling of malleable jobs on parallel systems, such as clusters, symmetric multiprocessing computers, and multi-core processor computers. Malleable jobs is a model of parallel processing in which jobs adapt to the number of processors assigned to them. This model permits the scheduler and resource manager to make more efficient use of the available resources. Each malleable job is characterized by arrival time, deadline, and value. If the job completes by its deadline, the user earns the payoff indicated by the value; otherwise, she earns a payoff of zero. The scheduling objective is to maximize the summore » of the values of the jobs that complete by their associated deadlines. Complicating the matter is that users in the real world are rational and they will attempt to manipulate the scheduler by misreporting their jobs’ parameters if it benefits them to do so. To mitigate this behavior, we design an incentive compatible online scheduling mechanism. Incentive compatibility assures us that the users will obtain the maximum payoff only if they truthfully report their jobs’ parameters to the scheduler. Finally, we simulate and study the mechanism to show the effects of misreports on the cheaters and on the system.« less
NASA Technical Reports Server (NTRS)
Agrawal, Gagan; Sussman, Alan; Saltz, Joel
1993-01-01
Scientific and engineering applications often involve structured meshes. These meshes may be nested (for multigrid codes) and/or irregularly coupled (called multiblock or irregularly coupled regular mesh problems). A combined runtime and compile-time approach for parallelizing these applications on distributed memory parallel machines in an efficient and machine-independent fashion was described. A runtime library which can be used to port these applications on distributed memory machines was designed and implemented. The library is currently implemented on several different systems. To further ease the task of application programmers, methods were developed for integrating this runtime library with compilers for HPK-like parallel programming languages. How this runtime library was integrated with the Fortran 90D compiler being developed at Syracuse University is discussed. Experimental results to demonstrate the efficacy of our approach are presented. A multiblock Navier-Stokes solver template and a multigrid code were experimented with. Our experimental results show that our primitives have low runtime communication overheads. Further, the compiler parallelized codes perform within 20 percent of the code parallelized by manually inserting calls to the runtime library.
Options for Parallelizing a Planning and Scheduling Algorithm
NASA Technical Reports Server (NTRS)
Clement, Bradley J.; Estlin, Tara A.; Bornstein, Benjamin D.
2011-01-01
Space missions have a growing interest in putting multi-core processors onboard spacecraft. For many missions processing power significantly slows operations. We investigate how continual planning and scheduling algorithms can exploit multi-core processing and outline different potential design decisions for a parallelized planning architecture. This organization of choices and challenges helps us with an initial design for parallelizing the CASPER planning system for a mesh multi-core processor. This work extends that presented at another workshop with some preliminary results.
Automated Solar Module Assembly Line
NASA Technical Reports Server (NTRS)
Bycer, M.
1979-01-01
The gathering of information that led to the design approach of the machine, and a summary of the findings in the areas of study along with a description of each station of the machine are discussed. The machine is a cell stringing and string applique machine which is flexible in design, capable of handling a variety of cells and assembling strings of cells which can then be placed in a matrix up to 4 ft x 2 ft. in series or parallel arrangement. The target machine cycle is to be 5 seconds per cell. This machine is primarily adapted to 100 MM round cells with one or two tabs between cells. It places finished strings of up to twelve cells in a matrix of up to six such strings arranged in series or in parallel.
Human-Machine Collaborative Optimization via Apprenticeship Scheduling
2016-09-09
prenticeship Scheduling (COVAS), which performs ma- chine learning using human expert demonstration, in conjunction with optimization, to automatically and ef...ficiently produce optimal solutions to challenging real- world scheduling problems. COVAS first learns a policy from human scheduling demonstration via...apprentice- ship learning , then uses this initial solution to provide a tight bound on the value of the optimal solution, thereby substantially
Efficient Parallel Kernel Solvers for Computational Fluid Dynamics Applications
NASA Technical Reports Server (NTRS)
Sun, Xian-He
1997-01-01
Distributed-memory parallel computers dominate today's parallel computing arena. These machines, such as Intel Paragon, IBM SP2, and Cray Origin2OO, have successfully delivered high performance computing power for solving some of the so-called "grand-challenge" problems. Despite initial success, parallel machines have not been widely accepted in production engineering environments due to the complexity of parallel programming. On a parallel computing system, a task has to be partitioned and distributed appropriately among processors to reduce communication cost and to attain load balance. More importantly, even with careful partitioning and mapping, the performance of an algorithm may still be unsatisfactory, since conventional sequential algorithms may be serial in nature and may not be implemented efficiently on parallel machines. In many cases, new algorithms have to be introduced to increase parallel performance. In order to achieve optimal performance, in addition to partitioning and mapping, a careful performance study should be conducted for a given application to find a good algorithm-machine combination. This process, however, is usually painful and elusive. The goal of this project is to design and develop efficient parallel algorithms for highly accurate Computational Fluid Dynamics (CFD) simulations and other engineering applications. The work plan is 1) developing highly accurate parallel numerical algorithms, 2) conduct preliminary testing to verify the effectiveness and potential of these algorithms, 3) incorporate newly developed algorithms into actual simulation packages. The work plan has well achieved. Two highly accurate, efficient Poisson solvers have been developed and tested based on two different approaches: (1) Adopting a mathematical geometry which has a better capacity to describe the fluid, (2) Using compact scheme to gain high order accuracy in numerical discretization. The previously developed Parallel Diagonal Dominant (PDD) algorithm and Reduced Parallel Diagonal Dominant (RPDD) algorithm have been carefully studied on different parallel platforms for different applications, and a NASA simulation code developed by Man M. Rai and his colleagues has been parallelized and implemented based on data dependency analysis. These achievements are addressed in detail in the paper.
Assessment of New Load Schedules for the Machine Calibration of a Force Balance
NASA Technical Reports Server (NTRS)
Ulbrich, N.; Gisler, R.; Kew, R.
2015-01-01
New load schedules for the machine calibration of a six-component force balance are currently being developed and evaluated at the NASA Ames Balance Calibration Laboratory. One of the proposed load schedules is discussed in the paper. It has a total of 2082 points that are distributed across 16 load series. Several criteria were applied to define the load schedule. It was decided, for example, to specify the calibration load set in force balance format as this approach greatly simplifies the definition of the lower and upper bounds of the load schedule. In addition, all loads are assumed to be applied in a calibration machine by using the one-factor-at-a-time approach. At first, all single-component loads are applied in six load series. Then, three two-component load series are applied. They consist of the load pairs (N1, N2), (S1, S2), and (RM, AF). Afterwards, four three-component load series are applied. They consist of the combinations (N1, N2, AF), (S1, S2, AF), (N1, N2, RM), and (S1, S2, RM). In the next step, one four-component load series is applied. It is the load combination (N1, N2, S1, S2). Finally, two five-component load series are applied. They are the load combination (N1, N2, S1, S2, AF) and (N1, N2, S1, S2, RM). The maximum difference between loads of two subsequent data points of the load schedule is limited to 33 % of capacity. This constraint helps avoid unwanted load "jumps" in the load schedule that can have a negative impact on the performance of a calibration machine. Only loadings of the single- and two-component load series are loaded to 100 % of capacity. This approach was selected because it keeps the total number of calibration points to a reasonable limit while still allowing for the application of some of the more complex load combinations. Data from two of NASA's force balances is used to illustrate important characteristics of the proposed 2082-point calibration load schedule.
1990-10-01
to economic, technological, spatial or logistic concerns, or involve training, man-machine interfaces, or integration into existing systems. Once the...probabilistic reasoning, mixed analysis- and simulation-oriented, mixed computation- and communication-oriented, nonpreemptive static priority...scheduling base, nonrandomized, preemptive static priority scheduling base, randomized, simulation-oriented, and static scheduling base. The selection of both
Three-Dimensional High-Lift Analysis Using a Parallel Unstructured Multigrid Solver
NASA Technical Reports Server (NTRS)
Mavriplis, Dimitri J.
1998-01-01
A directional implicit unstructured agglomeration multigrid solver is ported to shared and distributed memory massively parallel machines using the explicit domain-decomposition and message-passing approach. Because the algorithm operates on local implicit lines in the unstructured mesh, special care is required in partitioning the problem for parallel computing. A weighted partitioning strategy is described which avoids breaking the implicit lines across processor boundaries, while incurring minimal additional communication overhead. Good scalability is demonstrated on a 128 processor SGI Origin 2000 machine and on a 512 processor CRAY T3E machine for reasonably fine grids. The feasibility of performing large-scale unstructured grid calculations with the parallel multigrid algorithm is demonstrated by computing the flow over a partial-span flap wing high-lift geometry on a highly resolved grid of 13.5 million points in approximately 4 hours of wall clock time on the CRAY T3E.
Construction machine control guidance implementation strategy.
DOT National Transportation Integrated Search
2010-07-01
Machine Controlled Guidance (MCG) technology may be used in roadway and bridge construction to improve construction efficiencies, potentially resulting in reduced project costs and accelerated schedules. The technology utilizes a Global Positioning S...
Porting Gravitational Wave Signal Extraction to Parallel Virtual Machine (PVM)
NASA Technical Reports Server (NTRS)
Thirumalainambi, Rajkumar; Thompson, David E.; Redmon, Jeffery
2009-01-01
Laser Interferometer Space Antenna (LISA) is a planned NASA-ESA mission to be launched around 2012. The Gravitational Wave detection is fundamentally the determination of frequency, source parameters, and waveform amplitude derived in a specific order from the interferometric time-series of the rotating LISA spacecrafts. The LISA Science Team has developed a Mock LISA Data Challenge intended to promote the testing of complicated nested search algorithms to detect the 100-1 millihertz frequency signals at amplitudes of 10E-21. However, it has become clear that, sequential search of the parameters is very time consuming and ultra-sensitive; hence, a new strategy has been developed. Parallelization of existing sequential search algorithms of Gravitational Wave signal identification consists of decomposing sequential search loops, beginning with outermost loops and working inward. In this process, the main challenge is to detect interdependencies among loops and partitioning the loops so as to preserve concurrency. Existing parallel programs are based upon either shared memory or distributed memory paradigms. In PVM, master and node programs are used to execute parallelization and process spawning. The PVM can handle process management and process addressing schemes using a virtual machine configuration. The task scheduling and the messaging and signaling can be implemented efficiently for the LISA Gravitational Wave search process using a master and 6 nodes. This approach is accomplished using a server that is available at NASA Ames Research Center, and has been dedicated to the LISA Data Challenge Competition. Historically, gravitational wave and source identification parameters have taken around 7 days in this dedicated single thread Linux based server. Using PVM approach, the parameter extraction problem can be reduced to within a day. The low frequency computation and a proxy signal-to-noise ratio are calculated in separate nodes that are controlled by the master using message and vector of data passing. The message passing among nodes follows a pattern of synchronous and asynchronous send-and-receive protocols. The communication model and the message buffers are allocated dynamically to address rapid search of gravitational wave source information in the Mock LISA data sets.
Clock Agreement Among Parallel Supercomputer Nodes
Jones, Terry R.; Koenig, Gregory A.
2014-04-30
This dataset presents measurements that quantify the clock synchronization time-agreement characteristics among several high performance computers including the current world's most powerful machine for open science, the U.S. Department of Energy's Titan machine sited at Oak Ridge National Laboratory. These ultra-fast machines derive much of their computational capability from extreme node counts (over 18000 nodes in the case of the Titan machine). Time-agreement is commonly utilized by parallel programming applications and tools, distributed programming application and tools, and system software. Our time-agreement measurements detail the degree of time variance between nodes and how that variance changes over time. The dataset includes empirical measurements and the accompanying spreadsheets.
Characterizing parallel file-access patterns on a large-scale multiprocessor
NASA Technical Reports Server (NTRS)
Purakayastha, A.; Ellis, Carla; Kotz, David; Nieuwejaar, Nils; Best, Michael L.
1995-01-01
High-performance parallel file systems are needed to satisfy tremendous I/O requirements of parallel scientific applications. The design of such high-performance parallel file systems depends on a comprehensive understanding of the expected workload, but so far there have been very few usage studies of multiprocessor file systems. This paper is part of the CHARISMA project, which intends to fill this void by measuring real file-system workloads on various production parallel machines. In particular, we present results from the CM-5 at the National Center for Supercomputing Applications. Our results are unique because we collect information about nearly every individual I/O request from the mix of jobs running on the machine. Analysis of the traces leads to various recommendations for parallel file-system design.
Application of high-performance computing to numerical simulation of human movement
NASA Technical Reports Server (NTRS)
Anderson, F. C.; Ziegler, J. M.; Pandy, M. G.; Whalen, R. T.
1995-01-01
We have examined the feasibility of using massively-parallel and vector-processing supercomputers to solve large-scale optimization problems for human movement. Specifically, we compared the computational expense of determining the optimal controls for the single support phase of gait using a conventional serial machine (SGI Iris 4D25), a MIMD parallel machine (Intel iPSC/860), and a parallel-vector-processing machine (Cray Y-MP 8/864). With the human body modeled as a 14 degree-of-freedom linkage actuated by 46 musculotendinous units, computation of the optimal controls for gait could take up to 3 months of CPU time on the Iris. Both the Cray and the Intel are able to reduce this time to practical levels. The optimal solution for gait can be found with about 77 hours of CPU on the Cray and with about 88 hours of CPU on the Intel. Although the overall speeds of the Cray and the Intel were found to be similar, the unique capabilities of each machine are better suited to different portions of the computational algorithm used. The Intel was best suited to computing the derivatives of the performance criterion and the constraints whereas the Cray was best suited to parameter optimization of the controls. These results suggest that the ideal computer architecture for solving very large-scale optimal control problems is a hybrid system in which a vector-processing machine is integrated into the communication network of a MIMD parallel machine.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Barrett, Brian; Brightwell, Ronald B.; Grant, Ryan
This report presents a specification for the Portals 4 networ k programming interface. Portals 4 is intended to allow scalable, high-performance network communication betwee n nodes of a parallel computing system. Portals 4 is well suited to massively parallel processing and embedded syste ms. Portals 4 represents an adaption of the data movement layer developed for massively parallel processing platfor ms, such as the 4500-node Intel TeraFLOPS machine. Sandia's Cplant cluster project motivated the development of Version 3.0, which was later extended to Version 3.3 as part of the Cray Red Storm machine and XT line. Version 4 is tarmore » geted to the next generation of machines employing advanced network interface architectures that support enh anced offload capabilities.« less
The Portals 4.0 network programming interface.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Barrett, Brian W.; Brightwell, Ronald Brian; Pedretti, Kevin
2012-11-01
This report presents a specification for the Portals 4.0 network programming interface. Portals 4.0 is intended to allow scalable, high-performance network communication between nodes of a parallel computing system. Portals 4.0 is well suited to massively parallel processing and embedded systems. Portals 4.0 represents an adaption of the data movement layer developed for massively parallel processing platforms, such as the 4500-node Intel TeraFLOPS machine. Sandias Cplant cluster project motivated the development of Version 3.0, which was later extended to Version 3.3 as part of the Cray Red Storm machine and XT line. Version 4.0 is targeted to the next generationmore » of machines employing advanced network interface architectures that support enhanced offload capabilities.« less
NASA Astrophysics Data System (ADS)
Santosa, B.; Siswanto, N.; Fiqihesa
2018-04-01
This paper proposes a discrete Particle Swam Optimization (PSO) to solve limited-wait hybrid flowshop scheduing problem with multi objectives. Flow shop schedulimg represents the condition when several machines are arranged in series and each job must be processed at each machine with same sequence. The objective functions are minimizing completion time (makespan), total tardiness time, and total machine idle time. Flow shop scheduling model always grows to cope with the real production system accurately. Since flow shop scheduling is a NP-Hard problem then the most suitable method to solve is metaheuristics. One of metaheuristics algorithm is Particle Swarm Optimization (PSO), an algorithm which is based on the behavior of a swarm. Originally, PSO was intended to solve continuous optimization problems. Since flow shop scheduling is a discrete optimization problem, then, we need to modify PSO to fit the problem. The modification is done by using probability transition matrix mechanism. While to handle multi objectives problem, we use Pareto Optimal (MPSO). The results of MPSO is better than the PSO because the MPSO solution set produced higher probability to find the optimal solution. Besides the MPSO solution set is closer to the optimal solution
Applications of Parallel Computation in Micro-Mechanics and Finite Element Method
NASA Technical Reports Server (NTRS)
Tan, Hui-Qian
1996-01-01
This project discusses the application of parallel computations related with respect to material analyses. Briefly speaking, we analyze some kind of material by elements computations. We call an element a cell here. A cell is divided into a number of subelements called subcells and all subcells in a cell have the identical structure. The detailed structure will be given later in this paper. It is obvious that the problem is "well-structured". SIMD machine would be a better choice. In this paper we try to look into the potentials of SIMD machine in dealing with finite element computation by developing appropriate algorithms on MasPar, a SIMD parallel machine. In section 2, the architecture of MasPar will be discussed. A brief review of the parallel programming language MPL also is given in that section. In section 3, some general parallel algorithms which might be useful to the project will be proposed. And, combining with the algorithms, some features of MPL will be discussed in more detail. In section 4, the computational structure of cell/subcell model will be given. The idea of designing the parallel algorithm for the model will be demonstrated. Finally in section 5, a summary will be given.
NASA Astrophysics Data System (ADS)
Schratz, Patrick; Herrmann, Tobias; Brenning, Alexander
2017-04-01
Computational and statistical prediction methods such as the support vector machine have gained popularity in remote-sensing applications in recent years and are often compared to more traditional approaches like maximum-likelihood classification. However, the accuracy assessment of such predictive models in a spatial context needs to account for the presence of spatial autocorrelation in geospatial data by using spatial cross-validation and bootstrap strategies instead of their now more widely used non-spatial equivalent. The R package sperrorest by A. Brenning [IEEE International Geoscience and Remote Sensing Symposium, 1, 374 (2012)] provides a generic interface for performing (spatial) cross-validation of any statistical or machine-learning technique available in R. Since spatial statistical models as well as flexible machine-learning algorithms can be computationally expensive, parallel computing strategies are required to perform cross-validation efficiently. The most recent major release of sperrorest therefore comes with two new features (aside from improved documentation): The first one is the parallelized version of sperrorest(), parsperrorest(). This function features two parallel modes to greatly speed up cross-validation runs. Both parallel modes are platform independent and provide progress information. par.mode = 1 relies on the pbapply package and calls interactively (depending on the platform) parallel::mclapply() or parallel::parApply() in the background. While forking is used on Unix-Systems, Windows systems use a cluster approach for parallel execution. par.mode = 2 uses the foreach package to perform parallelization. This method uses a different way of cluster parallelization than the parallel package does. In summary, the robustness of parsperrorest() is increased with the implementation of two independent parallel modes. A new way of partitioning the data in sperrorest is provided by partition.factor.cv(). This function gives the user the possibility to perform cross-validation at the level of some grouping structure. As an example, in remote sensing of agricultural land uses, pixels from the same field contain nearly identical information and will thus be jointly placed in either the test set or the training set. Other spatial sampling resampling strategies are already available and can be extended by the user.
ERIC Educational Resources Information Center
Sukwong, Orathai
2013-01-01
Virtualization enables the ability to consolidate multiple servers on a single physical machine, increasing the infrastructure utilization. Maximizing the ratio of server virtual machines (VMs) to physical machines, namely the consolidation ratio, becomes an important goal toward infrastructure cost saving in a cloud. However, the consolidation…
30 CFR 75.209 - Automated Temporary Roof Support (ATRS) systems.
Code of Federal Regulations, 2012 CFR
2012-07-01
... paragraphs (b) and (c) of this section, an ATRS system shall be used with roof bolting machines and continuous-mining machines with integral roof bolters operated in a working section. The requirements of this paragraph shall be met according to the following schedule: (1) All new machines ordered after March 28...
30 CFR 75.209 - Automated Temporary Roof Support (ATRS) systems.
Code of Federal Regulations, 2013 CFR
2013-07-01
... paragraphs (b) and (c) of this section, an ATRS system shall be used with roof bolting machines and continuous-mining machines with integral roof bolters operated in a working section. The requirements of this paragraph shall be met according to the following schedule: (1) All new machines ordered after March 28...
30 CFR 75.209 - Automated Temporary Roof Support (ATRS) systems.
Code of Federal Regulations, 2014 CFR
2014-07-01
... paragraphs (b) and (c) of this section, an ATRS system shall be used with roof bolting machines and continuous-mining machines with integral roof bolters operated in a working section. The requirements of this paragraph shall be met according to the following schedule: (1) All new machines ordered after March 28...
System software for the finite element machine
NASA Technical Reports Server (NTRS)
Crockett, T. W.; Knott, J. D.
1985-01-01
The Finite Element Machine is an experimental parallel computer developed at Langley Research Center to investigate the application of concurrent processing to structural engineering analysis. This report describes system-level software which has been developed to facilitate use of the machine by applications researchers. The overall software design is outlined, and several important parallel processing issues are discussed in detail, including processor management, communication, synchronization, and input/output. Based on experience using the system, the hardware architecture and software design are critiqued, and areas for further work are suggested.
NASA Astrophysics Data System (ADS)
Sembiring, N.; Panjaitan, N.; Saragih, A. F.
2018-02-01
PT. XYZ is a manufacturing company that produces fresh fruit bunches (FFB) to Crude Palm Oil (CPO) and Palm Kernel Oil (PKO). PT. XYZ consists of six work stations: receipt station, sterilizing station, thressing station, pressing station, clarification station, and kernelery station. So far, the company is still implementing corrective maintenance maintenance system for production machines where the machine repair is done after damage occurs. Problems at PT. XYZ is the absence of scheduling engine maintenance in a planned manner resulting in the engine often damaged which can disrupt the smooth production. Another factor that is the problem in this research is the kernel station environment that becomes less convenient for operators such as there are machines and equipment not used in the production area, slippery, muddy, scattered fibers, incomplete use of PPE, and lack of employee discipline. The most commonly damaged machine is in the seed processing station (kernel station) which is cake breaker conveyor machine. The solution of this problem is to propose a schedule plan for maintenance of the machine by using the method of reliability centered maintenance and also the application of 5S. The result of the application of Reliability Centered maintenance method is obtained four components that must be treated scheduled (time directed), namely: for bearing component is 37 days, gearbox component is 97 days, CBC pen component is 35 days and conveyor pedal component is 32 days While after identification the application of 5S obtained the proposed corporate environmental improvement measures in accordance with the principles of 5S where unused goods will be moved from the production area, grouping goods based on their use, determining the procedure of cleaning the production area, conducting inspection in the use of PPE, and making 5S slogans.
Solving the Cauchy-Riemann equations on parallel computers
NASA Technical Reports Server (NTRS)
Fatoohi, Raad A.; Grosch, Chester E.
1987-01-01
Discussed is the implementation of a single algorithm on three parallel-vector computers. The algorithm is a relaxation scheme for the solution of the Cauchy-Riemann equations; a set of coupled first order partial differential equations. The computers were chosen so as to encompass a variety of architectures. They are: the MPP, and SIMD machine with 16K bit serial processors; FLEX/32, an MIMD machine with 20 processors; and CRAY/2, an MIMD machine with four vector processors. The machine architectures are briefly described. The implementation of the algorithm is discussed in relation to these architectures and measures of the performance on each machine are given. Simple performance models are used to describe the performance. These models highlight the bottlenecks and limiting factors for this algorithm on these architectures. Conclusions are presented.
Optimizing Irregular Applications for Energy and Performance on the Tilera Many-core Architecture
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chavarría-Miranda, Daniel; Panyala, Ajay R.; Halappanavar, Mahantesh
Optimizing applications simultaneously for energy and performance is a complex problem. High performance, parallel, irregular applications are notoriously hard to optimize due to their data-dependent memory accesses, lack of structured locality and complex data structures and code patterns. Irregular kernels are growing in importance in applications such as machine learning, graph analytics and combinatorial scientific computing. Performance- and energy-efficient implementation of these kernels on modern, energy efficient, multicore and many-core platforms is therefore an important and challenging problem. We present results from optimizing two irregular applications { the Louvain method for community detection (Grappolo), and high-performance conjugate gradient (HPCCG) {more » on the Tilera many-core system. We have significantly extended MIT's OpenTuner auto-tuning framework to conduct a detailed study of platform-independent and platform-specific optimizations to improve performance as well as reduce total energy consumption. We explore the optimization design space along three dimensions: memory layout schemes, compiler-based code transformations, and optimization of parallel loop schedules. Using auto-tuning, we demonstrate whole node energy savings of up to 41% relative to a baseline instantiation, and up to 31% relative to manually optimized variants.« less
NASA Astrophysics Data System (ADS)
Tabekina, N. A.; Chepchurov, M. S.; Evtushenko, E. I.; Dmitrievsky, B. S.
2018-05-01
The work solves the problem of automation of machining process namely turning to produce parts having the planes parallel to an axis of rotation of part without using special tools. According to the results, the availability of the equipment of a high speed electromechanical drive to control the operative movements of lathe machine will enable one to get the planes parallel to the part axis. The method of getting planes parallel to the part axis is based on the mathematical model, which is presented as functional dependency between the conveying velocity of the driven element and the time. It describes the operative movements of lathe machine all over the tool path. Using the model of movement of the tool, it has been found that the conveying velocity varies from the maximum to zero value. It will allow one to carry out the reverse of the drive. The scheme of tool placement regarding the workpiece has been proposed for unidirectional movement of the driven element at high conveying velocity. The control method of CNC machines can be used for getting geometrically complex parts on the lathe without using special milling tools.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Novikov, V.
1991-05-01
The U.S. Army's detailed equipment decontamination process is a stochastic flow shop which has N independent non-identical jobs (vehicles) which have overlapping processing times. This flow shop consists of up to six non-identical machines (stations). With the exception of one station, the processing times of the jobs are random variables. Based on an analysis of the processing times, the jobs for the 56 Army heavy division companies were scheduled according to the best shortest expected processing time - longest expected processing time (SEPT-LEPT) sequence. To assist in this scheduling the Gap Comparison Heuristic was developed to select the best SEPT-LEPTmore » schedule. This schedule was then used in balancing the detailed equipment decon line in order to find the best possible site configuration subject to several constraints. The detailed troop decon line, in which all jobs are independent and identically distributed, was then balanced. Lastly, an NBC decon optimization computer program was developed using the scheduling and line balancing results. This program serves as a prototype module for the ANBACIS automated NBC decision support system.... Decontamination, Stochastic flow shop, Scheduling, Stochastic scheduling, Minimization of the makespan, SEPT-LEPT Sequences, Flow shop line balancing, ANBACIS.« less
Multiagent scheduling method with earliness and tardiness objectives in flexible job shops.
Wu, Zuobao; Weng, Michael X
2005-04-01
Flexible job-shop scheduling problems are an important extension of the classical job-shop scheduling problems and present additional complexity. Such problems are mainly due to the existence of a considerable amount of overlapping capacities with modern machines. Classical scheduling methods are generally incapable of addressing such capacity overlapping. We propose a multiagent scheduling method with job earliness and tardiness objectives in a flexible job-shop environment. The earliness and tardiness objectives are consistent with the just-in-time production philosophy which has attracted significant attention in both industry and academic community. A new job-routing and sequencing mechanism is proposed. In this mechanism, two kinds of jobs are defined to distinguish jobs with one operation left from jobs with more than one operation left. Different criteria are proposed to route these two kinds of jobs. Job sequencing enables to hold a job that may be completed too early. Two heuristic algorithms for job sequencing are developed to deal with these two kinds of jobs. The computational experiments show that the proposed multiagent scheduling method significantly outperforms the existing scheduling methods in the literature. In addition, the proposed method is quite fast. In fact, the simulation time to find a complete schedule with over 2000 jobs on ten machines is less than 1.5 min.
Some single-machine scheduling problems with learning effects and two competing agents.
Li, Hongjie; Li, Zeyuan; Yin, Yunqiang
2014-01-01
This study considers a scheduling environment in which there are two agents and a set of jobs, each of which belongs to one of the two agents and its actual processing time is defined as a decreasing linear function of its starting time. Each of the two agents competes to process its respective jobs on a single machine and has its own scheduling objective to optimize. The objective is to assign the jobs so that the resulting schedule performs well with respect to the objectives of both agents. The objective functions addressed in this study include the maximum cost, the total weighted completion time, and the discounted total weighted completion time. We investigate three problems arising from different combinations of the objectives of the two agents. The computational complexity of the problems is discussed and solution algorithms where possible are presented.
NASA Astrophysics Data System (ADS)
Wang, Ji-Bo; Wang, Ming-Zheng; Ji, Ping
2012-05-01
In this article, we consider a single machine scheduling problem with a time-dependent learning effect and deteriorating jobs. By the effects of time-dependent learning and deterioration, we mean that the job processing time is defined by a function of its starting time and total normal processing time of jobs in front of it in the sequence. The objective is to determine an optimal schedule so as to minimize the total completion time. This problem remains open for the case of -1 < a < 0, where a denotes the learning index; we show that an optimal schedule of the problem is V-shaped with respect to job normal processing times. Three heuristic algorithms utilising the V-shaped property are proposed, and computational experiments show that the last heuristic algorithm performs effectively and efficiently in obtaining near-optimal solutions.
The portals 4.0.1 network programming interface.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Barrett, Brian W.; Brightwell, Ronald Brian; Pedretti, Kevin
2013-04-01
This report presents a specification for the Portals 4.0 network programming interface. Portals 4.0 is intended to allow scalable, high-performance network communication between nodes of a parallel computing system. Portals 4.0 is well suited to massively parallel processing and embedded systems. Portals 4.0 represents an adaption of the data movement layer developed for massively parallel processing platforms, such as the 4500-node Intel TeraFLOPS machine. Sandias Cplant cluster project motivated the development of Version 3.0, which was later extended to Version 3.3 as part of the Cray Red Storm machine and XT line. Version 4.0 is targeted to the next generationmore » of machines employing advanced network interface architectures that support enhanced offload capabilities. 3« less
Execution models for mapping programs onto distributed memory parallel computers
NASA Technical Reports Server (NTRS)
Sussman, Alan
1992-01-01
The problem of exploiting the parallelism available in a program to efficiently employ the resources of the target machine is addressed. The problem is discussed in the context of building a mapping compiler for a distributed memory parallel machine. The paper describes using execution models to drive the process of mapping a program in the most efficient way onto a particular machine. Through analysis of the execution models for several mapping techniques for one class of programs, we show that the selection of the best technique for a particular program instance can make a significant difference in performance. On the other hand, the results of benchmarks from an implementation of a mapping compiler show that our execution models are accurate enough to select the best mapping technique for a given program.
Machine learning in updating predictive models of planning and scheduling transportation projects
DOT National Transportation Integrated Search
1997-01-01
A method combining machine learning and regression analysis to automatically and intelligently update predictive models used in the Kansas Department of Transportations (KDOTs) internal management system is presented. The predictive models used...
30 CFR 18.97 - Inspection of machines; minimum requirements.
Code of Federal Regulations, 2013 CFR
2013-07-01
... all electrical components for materials, workmanship, design, and construction; (2) Examination of all components of the machine which have been approved or certified under Bureau of Mines Schedule 2D, 2E, 2F, or...
30 CFR 18.97 - Inspection of machines; minimum requirements.
Code of Federal Regulations, 2012 CFR
2012-07-01
... all electrical components for materials, workmanship, design, and construction; (2) Examination of all components of the machine which have been approved or certified under Bureau of Mines Schedule 2D, 2E, 2F, or...
30 CFR 18.97 - Inspection of machines; minimum requirements.
Code of Federal Regulations, 2014 CFR
2014-07-01
... all electrical components for materials, workmanship, design, and construction; (2) Examination of all components of the machine which have been approved or certified under Bureau of Mines Schedule 2D, 2E, 2F, or...
Virtual Mission Operations of Remote Sensors With Rapid Access To and From Space
NASA Technical Reports Server (NTRS)
Ivancic, William D.; Stewart, Dave; Walke, Jon; Dikeman, Larry; Sage, Steven; Miller, Eric; Northam, James; Jackson, Chris; Taylor, John; Lynch, Scott;
2010-01-01
This paper describes network-centric operations, where a virtual mission operations center autonomously receives sensor triggers, and schedules space and ground assets using Internet-based technologies and service-oriented architectures. For proof-of-concept purposes, sensor triggers are received from the United States Geological Survey (USGS) to determine targets for space-based sensors. The Surrey Satellite Technology Limited (SSTL) Disaster Monitoring Constellation satellite, the United Kingdom Disaster Monitoring Constellation (UK-DMC), is used as the space-based sensor. The UK-DMC s availability is determined via machine-to-machine communications using SSTL s mission planning system. Access to/from the UK-DMC for tasking and sensor data is via SSTL s and Universal Space Network s (USN) ground assets. The availability and scheduling of USN s assets can also be performed autonomously via machine-to-machine communications. All communication, both on the ground and between ground and space, uses open Internet standards.
Scheduling Earth Observing Fleets Using Evolutionary Algorithms: Problem Description and Approach
NASA Technical Reports Server (NTRS)
Globus, Al; Crawford, James; Lohn, Jason; Morris, Robert; Clancy, Daniel (Technical Monitor)
2002-01-01
We describe work in progress concerning multi-instrument, multi-satellite scheduling. Most, although not all, Earth observing instruments currently in orbit are unique. In the relatively near future, however, we expect to see fleets of Earth observing spacecraft, many carrying nearly identical instruments. This presents a substantially new scheduling challenge. Inspired by successful commercial applications of evolutionary algorithms in scheduling domains, this paper presents work in progress regarding the use of evolutionary algorithms to solve a set of Earth observing related model problems. Both the model problems and the software are described. Since the larger problems will require substantial computation and evolutionary algorithms are embarrassingly parallel, we discuss our parallelization techniques using dedicated and cycle-scavenged workstations.
Frutos, M; Méndez, M; Tohmé, F; Broz, D
2013-01-01
Many of the problems that arise in production systems can be handled with multiobjective techniques. One of those problems is that of scheduling operations subject to constraints on the availability of machines and buffer capacity. In this paper we analyze different Evolutionary multiobjective Algorithms (MOEAs) for this kind of problems. We consider an experimental framework in which we schedule production operations for four real world Job-Shop contexts using three algorithms, NSGAII, SPEA2, and IBEA. Using two performance indexes, Hypervolume and R2, we found that SPEA2 and IBEA are the most efficient for the tasks at hand. On the other hand IBEA seems to be a better choice of tool since it yields more solutions in the approximate Pareto frontier.
Highly parallel sparse Cholesky factorization
NASA Technical Reports Server (NTRS)
Gilbert, John R.; Schreiber, Robert
1990-01-01
Several fine grained parallel algorithms were developed and compared to compute the Cholesky factorization of a sparse matrix. The experimental implementations are on the Connection Machine, a distributed memory SIMD machine whose programming model conceptually supplies one processor per data element. In contrast to special purpose algorithms in which the matrix structure conforms to the connection structure of the machine, the focus is on matrices with arbitrary sparsity structure. The most promising algorithm is one whose inner loop performs several dense factorizations simultaneously on a 2-D grid of processors. Virtually any massively parallel dense factorization algorithm can be used as the key subroutine. The sparse code attains execution rates comparable to those of the dense subroutine. Although at present architectural limitations prevent the dense factorization from realizing its potential efficiency, it is concluded that a regular data parallel architecture can be used efficiently to solve arbitrarily structured sparse problems. A performance model is also presented and it is used to analyze the algorithms.
Single machine scheduling with slack due dates assignment
NASA Astrophysics Data System (ADS)
Liu, Weiguo; Hu, Xiangpei; Wang, Xuyin
2017-04-01
This paper considers a single machine scheduling problem in which each job is assigned an individual due date based on a common flow allowance (i.e. all jobs have slack due date). The goal is to find a sequence for jobs, together with a due date assignment, that minimizes a non-regular criterion comprising the total weighted absolute lateness value and common flow allowance cost, where the weight is a position-dependent weight. In order to solve this problem, an ? time algorithm is proposed. Some extensions of the problem are also shown.
Specification and Analysis of Parallel Machine Architecture
1990-03-17
Parallel Machine Architeture C.V. Ramamoorthy Computer Science Division Dept. of Electrical Engineering and Computer Science University of California...capacity. (4) Adaptive: The overhead in resolution of deadlocks, etc. should be in proportion to their frequency. (5) Avoid rollbacks: Rollbacks can be...snapshots of system state graphically at a rate proportional to simulation time. Some of the examples are as follow: (1) When the simulation clock of
Scaling Support Vector Machines On Modern HPC Platforms
DOE Office of Scientific and Technical Information (OSTI.GOV)
You, Yang; Fu, Haohuan; Song, Shuaiwen
2015-02-01
We designed and implemented MIC-SVM, a highly efficient parallel SVM for x86 based multicore and many-core architectures, such as the Intel Ivy Bridge CPUs and Intel Xeon Phi co-processor (MIC). We propose various novel analysis methods and optimization techniques to fully utilize the multilevel parallelism provided by these architectures and serve as general optimization methods for other machine learning tools.
Efficiently Scheduling Multi-core Guest Virtual Machines on Multi-core Hosts in Network Simulation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yoginath, Srikanth B; Perumalla, Kalyan S
2011-01-01
Virtual machine (VM)-based simulation is a method used by network simulators to incorporate realistic application behaviors by executing actual VMs as high-fidelity surrogates for simulated end-hosts. A critical requirement in such a method is the simulation time-ordered scheduling and execution of the VMs. Prior approaches such as time dilation are less efficient due to the high degree of multiplexing possible when multiple multi-core VMs are simulated on multi-core host systems. We present a new simulation time-ordered scheduler to efficiently schedule multi-core VMs on multi-core real hosts, with a virtual clock realized on each virtual core. The distinguishing features of ourmore » approach are: (1) customizable granularity of the VM scheduling time unit on the simulation time axis, (2) ability to take arbitrary leaps in virtual time by VMs to maximize the utilization of host (real) cores when guest virtual cores idle, and (3) empirically determinable optimality in the tradeoff between total execution (real) time and time-ordering accuracy levels. Experiments show that it is possible to get nearly perfect time-ordered execution, with a slight cost in total run time, relative to optimized non-simulation VM schedulers. Interestingly, with our time-ordered scheduler, it is also possible to reduce the time-ordering error from over 50% of non-simulation scheduler to less than 1% realized by our scheduler, with almost the same run time efficiency as that of the highly efficient non-simulation VM schedulers.« less
Architectures for reasoning in parallel
NASA Technical Reports Server (NTRS)
Hall, Lawrence O.
1989-01-01
The research conducted has dealt with rule-based expert systems. The algorithms that may lead to effective parallelization of them were investigated. Both the forward and backward chained control paradigms were investigated in the course of this work. The best computer architecture for the developed and investigated algorithms has been researched. Two experimental vehicles were developed to facilitate this research. They are Backpac, a parallel backward chained rule-based reasoning system and Datapac, a parallel forward chained rule-based reasoning system. Both systems have been written in Multilisp, a version of Lisp which contains the parallel construct, future. Applying the future function to a function causes the function to become a task parallel to the spawning task. Additionally, Backpac and Datapac have been run on several disparate parallel processors. The machines are an Encore Multimax with 10 processors, the Concert Multiprocessor with 64 processors, and a 32 processor BBN GP1000. Both the Concert and the GP1000 are switch-based machines. The Multimax has all its processors hung off a common bus. All are shared memory machines, but have different schemes for sharing the memory and different locales for the shared memory. The main results of the investigations come from experiments on the 10 processor Encore and the Concert with partitions of 32 or less processors. Additionally, experiments have been run with a stripped down version of EMYCIN.
On the relationship between parallel computation and graph embedding
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gupta, A.K.
1989-01-01
The problem of efficiently simulating an algorithm designed for an n-processor parallel machine G on an m-processor parallel machine H with n > m arises when parallel algorithms designed for an ideal size machine are simulated on existing machines which are of a fixed size. The author studies this problem when every processor of H takes over the function of a number of processors in G, and he phrases the simulation problem as a graph embedding problem. New embeddings presented address relevant issues arising from the parallel computation environment. The main focus centers around embedding complete binary trees into smaller-sizedmore » binary trees, butterflies, and hypercubes. He also considers simultaneous embeddings of r source machines into a single hypercube. Constant factors play a crucial role in his embeddings since they are not only important in practice but also lead to interesting theoretical problems. All of his embeddings minimize dilation and load, which are the conventional cost measures in graph embeddings and determine the maximum amount of time required to simulate one step of G on H. His embeddings also optimize a new cost measure called ({alpha},{beta})-utilization which characterizes how evenly the processors of H are used by the processors of G. Ideally, the utilization should be balanced (i.e., every processor of H simulates at most (n/m) processors of G) and the ({alpha},{beta})-utilization measures how far off from a balanced utilization the embedding is. He presents embeddings for the situation when some processors of G have different capabilities (e.g. memory or I/O) than others and the processors with different capabilities are to be distributed uniformly among the processors of H. Placing such conditions on an embedding results in an increase in some of the cost measures.« less
Parallel Preconditioning for CFD Problems on the CM-5
NASA Technical Reports Server (NTRS)
Simon, Horst D.; Kremenetsky, Mark D.; Richardson, John; Lasinski, T. A. (Technical Monitor)
1994-01-01
Up to today, preconditioning methods on massively parallel systems have faced a major difficulty. The most successful preconditioning methods in terms of accelerating the convergence of the iterative solver such as incomplete LU factorizations are notoriously difficult to implement on parallel machines for two reasons: (1) the actual computation of the preconditioner is not very floating-point intensive, but requires a large amount of unstructured communication, and (2) the application of the preconditioning matrix in the iteration phase (i.e. triangular solves) are difficult to parallelize because of the recursive nature of the computation. Here we present a new approach to preconditioning for very large, sparse, unsymmetric, linear systems, which avoids both difficulties. We explicitly compute an approximate inverse to our original matrix. This new preconditioning matrix can be applied most efficiently for iterative methods on massively parallel machines, since the preconditioning phase involves only a matrix-vector multiplication, with possibly a dense matrix. Furthermore the actual computation of the preconditioning matrix has natural parallelism. For a problem of size n, the preconditioning matrix can be computed by solving n independent small least squares problems. The algorithm and its implementation on the Connection Machine CM-5 are discussed in detail and supported by extensive timings obtained from real problem data.
A Genetic Algorithm for Flow Shop Scheduling with Assembly Operations to Minimize Makespan
NASA Astrophysics Data System (ADS)
Bhongade, A. S.; Khodke, P. M.
2014-04-01
Manufacturing systems, in which, several parts are processed through machining workstations and later assembled to form final products, is common. Though scheduling of such problems are solved using heuristics, available solution approaches can provide solution for only moderate sized problems due to large computation time required. In this work, scheduling approach is developed for such flow-shop manufacturing system having machining workstations followed by assembly workstations. The initial schedule is generated using Disjunctive method and genetic algorithm (GA) is applied further for generating schedule for large sized problems. GA is found to give near optimal solution based on the deviation of makespan from lower bound. The lower bound of makespan of such problem is estimated and percent deviation of makespan from lower bounds is used as a performance measure to evaluate the schedules. Computational experiments are conducted on problems developed using fractional factorial orthogonal array, varying the number of parts per product, number of products, and number of workstations (ranging upto 1,520 number of operations). A statistical analysis indicated the significance of all the three factors considered. It is concluded that GA method can obtain optimal makespan.
NASA Astrophysics Data System (ADS)
Capone, V.; Esposito, R.; Pardi, S.; Taurino, F.; Tortone, G.
2012-12-01
Over the last few years we have seen an increasing number of services and applications needed to manage and maintain cloud computing facilities. This is particularly true for computing in high energy physics, which often requires complex configurations and distributed infrastructures. In this scenario a cost effective rationalization and consolidation strategy is the key to success in terms of scalability and reliability. In this work we describe an IaaS (Infrastructure as a Service) cloud computing system, with high availability and redundancy features, which is currently in production at INFN-Naples and ATLAS Tier-2 data centre. The main goal we intended to achieve was a simplified method to manage our computing resources and deliver reliable user services, reusing existing hardware without incurring heavy costs. A combined usage of virtualization and clustering technologies allowed us to consolidate our services on a small number of physical machines, reducing electric power costs. As a result of our efforts we developed a complete solution for data and computing centres that can be easily replicated using commodity hardware. Our architecture consists of 2 main subsystems: a clustered storage solution, built on top of disk servers running GlusterFS file system, and a virtual machines execution environment. GlusterFS is a network file system able to perform parallel writes on multiple disk servers, providing this way live replication of data. High availability is also achieved via a network configuration using redundant switches and multiple paths between hypervisor hosts and disk servers. We also developed a set of management scripts to easily perform basic system administration tasks such as automatic deployment of new virtual machines, adaptive scheduling of virtual machines on hypervisor hosts, live migration and automated restart in case of hypervisor failures.
Constraint monitoring in TOSCA
NASA Technical Reports Server (NTRS)
Beck, Howard
1992-01-01
The Job-Shop Scheduling Problem (JSSP) deals with the allocation of resources over time to factory operations. Allocations are subject to various constraints (e.g., production precedence relationships, factory capacity constraints, and limits on the allowable number of machine setups) which must be satisfied for a schedule to be valid. The identification of constraint violations and the monitoring of constraint threats plays a vital role in schedule generation in terms of the following: (1) directing the scheduling process; and (2) informing scheduling decisions. This paper describes a general mechanism for identifying constraint violations and monitoring threats to the satisfaction of constraints throughout schedule generation.
Exploiting loop level parallelism in nonprocedural dataflow programs
NASA Technical Reports Server (NTRS)
Gokhale, Maya B.
1987-01-01
Discussed are how loop level parallelism is detected in a nonprocedural dataflow program, and how a procedural program with concurrent loops is scheduled. Also discussed is a program restructuring technique which may be applied to recursive equations so that concurrent loops may be generated for a seemingly iterative computation. A compiler which generates C code for the language described below has been implemented. The scheduling component of the compiler and the restructuring transformation are described.
Computer-Aided Parallelizer and Optimizer
NASA Technical Reports Server (NTRS)
Jin, Haoqiang
2011-01-01
The Computer-Aided Parallelizer and Optimizer (CAPO) automates the insertion of compiler directives (see figure) to facilitate parallel processing on Shared Memory Parallel (SMP) machines. While CAPO currently is integrated seamlessly into CAPTools (developed at the University of Greenwich, now marketed as ParaWise), CAPO was independently developed at Ames Research Center as one of the components for the Legacy Code Modernization (LCM) project. The current version takes serial FORTRAN programs, performs interprocedural data dependence analysis, and generates OpenMP directives. Due to the widely supported OpenMP standard, the generated OpenMP codes have the potential to run on a wide range of SMP machines. CAPO relies on accurate interprocedural data dependence information currently provided by CAPTools. Compiler directives are generated through identification of parallel loops in the outermost level, construction of parallel regions around parallel loops and optimization of parallel regions, and insertion of directives with automatic identification of private, reduction, induction, and shared variables. Attempts also have been made to identify potential pipeline parallelism (implemented with point-to-point synchronization). Although directives are generated automatically, user interaction with the tool is still important for producing good parallel codes. A comprehensive graphical user interface is included for users to interact with the parallelization process.
Frutos, M.; Méndez, M.; Tohmé, F.; Broz, D.
2013-01-01
Many of the problems that arise in production systems can be handled with multiobjective techniques. One of those problems is that of scheduling operations subject to constraints on the availability of machines and buffer capacity. In this paper we analyze different Evolutionary multiobjective Algorithms (MOEAs) for this kind of problems. We consider an experimental framework in which we schedule production operations for four real world Job-Shop contexts using three algorithms, NSGAII, SPEA2, and IBEA. Using two performance indexes, Hypervolume and R2, we found that SPEA2 and IBEA are the most efficient for the tasks at hand. On the other hand IBEA seems to be a better choice of tool since it yields more solutions in the approximate Pareto frontier. PMID:24489502
2008-03-01
order fulfillment visibility, Kanban deployment, inventory count can be made visually, machines and tool labeling, costs, preventive maintenance...order fulfillment, computer scheduling versus Kanban , pull versus push systems, flow time efficiencies, back room costs of scheduling, MRP costs
An efficient annealing in Boltzmann machine in Hopfield neural network
NASA Astrophysics Data System (ADS)
Kin, Teoh Yeong; Hasan, Suzanawati Abu; Bulot, Norhisam; Ismail, Mohammad Hafiz
2012-09-01
This paper proposes and implements Boltzmann machine in Hopfield neural network doing logic programming based on the energy minimization system. The temperature scheduling in Boltzmann machine enhancing the performance of doing logic programming in Hopfield neural network. The finest temperature is determined by observing the ratio of global solution and final hamming distance using computer simulations. The study shows that Boltzmann Machine model is more stable and competent in term of representing and solving difficult combinatory problems.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Not Available
1978-06-01
Following a planning period during which the Lawrence Livermore Laboratory and the Department of Defense managing sponsor, the USAF Materials Laboratory, agreed on work statements, the Department of Defense Tri-Service Precision Machine-Tool Program began in February 1978. Milestones scheduled for the first quarter have been met. Tasks and manpower requirements for two basic projects, precision-machining commercialization (PMC) and a machine-tool task force (MTTF), were defined. Progress by PMC includes: (1) documentation of existing precision machine-tool technology by initiation and compilation of a bibliography containing several hundred entries: (2) identification of the problems and needs of precision turning-machine builders and ofmore » precision turning-machine users interested in developing high-precision machining capability; and (3) organization of the schedule and content of the first seminar, to be held in October 1978, which will bring together representatives from the machine-tool and optics communities to address the problems and begin the process of high-precision machining commercialization. Progress by MTTF includes: (1) planning for the organization of a team effort of approximately 60 to 80 international experts to contribute in various ways to project objectives, namely, to summarize state-of-the-art cutting-machine-tool technology and to identify areas where future R and D should prove technically and economically profitable; (2) preparation of a comprehensive plan to achieve those objectives; and (3) preliminary arrangements for a plenary session, also in October, when the task force will meet to formalize the details for implementing the plan.« less
NASA Technical Reports Server (NTRS)
Dagum, Leonardo
1989-01-01
The data parallel implementation of a particle simulation for hypersonic rarefied flow described by Dagum associates a single parallel data element with each particle in the simulation. The simulated space is divided into discrete regions called cells containing a variable and constantly changing number of particles. The implementation requires a global sort of the parallel data elements so as to arrange them in an order that allows immediate access to the information associated with cells in the simulation. Described here is a very fast algorithm for performing the necessary ranking of the parallel data elements. The performance of the new algorithm is compared with that of the microcoded instruction for ranking on the Connection Machine.
NASA Technical Reports Server (NTRS)
Sun, Xian-He; Moitra, Stuti
1996-01-01
Various tridiagonal solvers have been proposed in recent years for different parallel platforms. In this paper, the performance of three tridiagonal solvers, namely, the parallel partition LU algorithm, the parallel diagonal dominant algorithm, and the reduced diagonal dominant algorithm, is studied. These algorithms are designed for distributed-memory machines and are tested on an Intel Paragon and an IBM SP2 machines. Measured results are reported in terms of execution time and speedup. Analytical study are conducted for different communication topologies and for different tridiagonal systems. The measured results match the analytical results closely. In addition to address implementation issues, performance considerations such as problem sizes and models of speedup are also discussed.
Early experiences in developing and managing the neuroscience gateway.
Sivagnanam, Subhashini; Majumdar, Amit; Yoshimoto, Kenneth; Astakhov, Vadim; Bandrowski, Anita; Martone, MaryAnn; Carnevale, Nicholas T
2015-02-01
The last few decades have seen the emergence of computational neuroscience as a mature field where researchers are interested in modeling complex and large neuronal systems and require access to high performance computing machines and associated cyber infrastructure to manage computational workflow and data. The neuronal simulation tools, used in this research field, are also implemented for parallel computers and suitable for high performance computing machines. But using these tools on complex high performance computing machines remains a challenge because of issues with acquiring computer time on these machines located at national supercomputer centers, dealing with complex user interface of these machines, dealing with data management and retrieval. The Neuroscience Gateway is being developed to alleviate and/or hide these barriers to entry for computational neuroscientists. It hides or eliminates, from the point of view of the users, all the administrative and technical barriers and makes parallel neuronal simulation tools easily available and accessible on complex high performance computing machines. It handles the running of jobs and data management and retrieval. This paper shares the early experiences in bringing up this gateway and describes the software architecture it is based on, how it is implemented, and how users can use this for computational neuroscience research using high performance computing at the back end. We also look at parallel scaling of some publicly available neuronal models and analyze the recent usage data of the neuroscience gateway.
Early experiences in developing and managing the neuroscience gateway
Sivagnanam, Subhashini; Majumdar, Amit; Yoshimoto, Kenneth; Astakhov, Vadim; Bandrowski, Anita; Martone, MaryAnn; Carnevale, Nicholas. T.
2015-01-01
SUMMARY The last few decades have seen the emergence of computational neuroscience as a mature field where researchers are interested in modeling complex and large neuronal systems and require access to high performance computing machines and associated cyber infrastructure to manage computational workflow and data. The neuronal simulation tools, used in this research field, are also implemented for parallel computers and suitable for high performance computing machines. But using these tools on complex high performance computing machines remains a challenge because of issues with acquiring computer time on these machines located at national supercomputer centers, dealing with complex user interface of these machines, dealing with data management and retrieval. The Neuroscience Gateway is being developed to alleviate and/or hide these barriers to entry for computational neuroscientists. It hides or eliminates, from the point of view of the users, all the administrative and technical barriers and makes parallel neuronal simulation tools easily available and accessible on complex high performance computing machines. It handles the running of jobs and data management and retrieval. This paper shares the early experiences in bringing up this gateway and describes the software architecture it is based on, how it is implemented, and how users can use this for computational neuroscience research using high performance computing at the back end. We also look at parallel scaling of some publicly available neuronal models and analyze the recent usage data of the neuroscience gateway. PMID:26523124
2000-04-01
be an extension of Utah’s nascent Quarks system, oriented to closely coupled cluster environments. However, the grant did not actually begin until... Intel x86, implemented ten virtual machine monitors and servers, including a virtual memory manager, a checkpointer, a process manager, a file server...Fluke, we developed a novel hierarchical processor scheduling frame- work called CPU inheritance scheduling [5]. This is a framework for scheduling
NASA Technical Reports Server (NTRS)
Lyster, P. M.; Liewer, P. C.; Decyk, V. K.; Ferraro, R. D.
1995-01-01
A three-dimensional electrostatic particle-in-cell (PIC) plasma simulation code has been developed on coarse-grain distributed-memory massively parallel computers with message passing communications. Our implementation is the generalization to three-dimensions of the general concurrent particle-in-cell (GCPIC) algorithm. In the GCPIC algorithm, the particle computation is divided among the processors using a domain decomposition of the simulation domain. In a three-dimensional simulation, the domain can be partitioned into one-, two-, or three-dimensional subdomains ("slabs," "rods," or "cubes") and we investigate the efficiency of the parallel implementation of the push for all three choices. The present implementation runs on the Intel Touchstone Delta machine at Caltech; a multiple-instruction-multiple-data (MIMD) parallel computer with 512 nodes. We find that the parallel efficiency of the push is very high, with the ratio of communication to computation time in the range 0.3%-10.0%. The highest efficiency (> 99%) occurs for a large, scaled problem with 64(sup 3) particles per processing node (approximately 134 million particles of 512 nodes) which has a push time of about 250 ns per particle per time step. We have also developed expressions for the timing of the code which are a function of both code parameters (number of grid points, particles, etc.) and machine-dependent parameters (effective FLOP rate, and the effective interprocessor bandwidths for the communication of particles and grid points). These expressions can be used to estimate the performance of scaled problems--including those with inhomogeneous plasmas--to other parallel machines once the machine-dependent parameters are known.
NASA Astrophysics Data System (ADS)
Lucian, P.; Gheorghe, S.
2017-08-01
This paper presents a new method, based on FRISCO formula, for optimizing the choice of the best control system for kinematical feed chains with great distance between slides used in computer numerical controlled machine tools. Such machines are usually, but not limited to, used for machining large and complex parts (mostly in the aviation industry) or complex casting molds. For such machine tools the kinematic feed chains are arranged in a dual-parallel drive structure that allows the mobile element to be moved by the two kinematical branches and their related control systems. Such an arrangement allows for high speed and high rigidity (a critical requirement for precision machining) during the machining process. A significant issue for such an arrangement it’s the ability of the two parallel control systems to follow the same trajectory accurately in order to address this issue it is necessary to achieve synchronous motion control for the two kinematical branches ensuring that the correct perpendicular position it’s kept by the mobile element during its motion on the two slides.
Communication Studies of DMP and SMP Machines
NASA Technical Reports Server (NTRS)
Sohn, Andrew; Biswas, Rupak; Chancellor, Marisa K. (Technical Monitor)
1997-01-01
Understanding the interplay between machines and problems is key to obtaining high performance on parallel machines. This paper investigates the interplay between programming paradigms and communication capabilities of parallel machines. In particular, we explicate the communication capabilities of the IBM SP-2 distributed-memory multiprocessor and the SGI PowerCHALLENGEarray symmetric multiprocessor. Two benchmark problems of bitonic sorting and Fast Fourier Transform are selected for experiments. Communication-efficient algorithms are developed to exploit the overlapping capabilities of the machines. Programs are written in Message-Passing Interface for portability and identical codes are used for both machines. Various data sizes and message sizes are used to test the machines' communication capabilities. Experimental results indicate that the communication performance of the multiprocessors are consistent with the size of messages. The SP-2 is sensitive to message size but yields a much higher communication overlapping because of the communication co-processor. The PowerCHALLENGEarray is not highly sensitive to message size and yields a low communication overlapping. Bitonic sorting yields lower performance compared to FFT due to a smaller computation-to-communication ratio.
Evaluation of Job Queuing/Scheduling Software: Phase I Report
NASA Technical Reports Server (NTRS)
Jones, James Patton
1996-01-01
The recent proliferation of high performance work stations and the increased reliability of parallel systems have illustrated the need for robust job management systems to support parallel applications. To address this issue, the national Aerodynamic Simulation (NAS) supercomputer facility compiled a requirements checklist for job queuing/scheduling software. Next, NAS began an evaluation of the leading job management system (JMS) software packages against the checklist. This report describes the three-phase evaluation process, and presents the results of Phase 1: Capabilities versus Requirements. We show that JMS support for running parallel applications on clusters of workstations and parallel systems is still insufficient, even in the leading JMS's. However, by ranking each JMS evaluated against the requirements, we provide data that will be useful to other sites in selecting a JMS.
Analysis of tasks for dynamic man/machine load balancing in advanced helicopters
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jorgensen, C.C.
1987-10-01
This report considers task allocation requirements imposed by advanced helicopter designs incorporating mixes of human pilots and intelligent machines. Specifically, it develops an analogy between load balancing using distributed non-homogeneous multiprocessors and human team functions. A taxonomy is presented which can be used to identify task combinations likely to cause overload for dynamic scheduling and process allocation mechanisms. Designer criteria are given for function decomposition, separation of control from data, and communication handling for dynamic tasks. Possible effects of n-p complete scheduling problems are noted and a class of combinatorial optimization methods are examined.
NASA Astrophysics Data System (ADS)
Birgin, Ernesto G.; Ronconi, Débora P.
2012-10-01
The single machine scheduling problem with a common due date and non-identical ready times for the jobs is examined in this work. Performance is measured by the minimization of the weighted sum of earliness and tardiness penalties of the jobs. Since this problem is NP-hard, the application of constructive heuristics that exploit specific characteristics of the problem to improve their performance is investigated. The proposed approaches are examined through a computational comparative study on a set of 280 benchmark test problems with up to 1000 jobs.
NASA Astrophysics Data System (ADS)
Mirabi, Mohammad; Fatemi Ghomi, S. M. T.; Jolai, F.
2014-04-01
Flow-shop scheduling problem (FSP) deals with the scheduling of a set of n jobs that visit a set of m machines in the same order. As the FSP is NP-hard, there is no efficient algorithm to reach the optimal solution of the problem. To minimize the holding, delay and setup costs of large permutation flow-shop scheduling problems with sequence-dependent setup times on each machine, this paper develops a novel hybrid genetic algorithm (HGA) with three genetic operators. Proposed HGA applies a modified approach to generate a pool of initial solutions, and also uses an improved heuristic called the iterated swap procedure to improve the initial solutions. We consider the make-to-order production approach that some sequences between jobs are assumed as tabu based on maximum allowable setup cost. In addition, the results are compared to some recently developed heuristics and computational experimental results show that the proposed HGA performs very competitively with respect to accuracy and efficiency of solution.
Multiplexing Low and High QoS Workloads in Virtual Environments
NASA Astrophysics Data System (ADS)
Verboven, Sam; Vanmechelen, Kurt; Broeckhove, Jan
Virtualization technology has introduced new ways for managing IT infrastructure. The flexible deployment of applications through self-contained virtual machine images has removed the barriers for multiplexing, suspending and migrating applications with their entire execution environment, allowing for a more efficient use of the infrastructure. These developments have given rise to an important challenge regarding the optimal scheduling of virtual machine workloads. In this paper, we specifically address the VM scheduling problem in which workloads that require guaranteed levels of CPU performance are mixed with workloads that do not require such guarantees. We introduce a framework to analyze this scheduling problem and evaluate to what extent such mixed service delivery is beneficial for a provider of virtualized IT infrastructure. Traditionally providers offer IT resources under a guaranteed and fixed performance profile, which can lead to underutilization. The findings of our simulation study show that through proper tuning of a limited set of parameters, the proposed scheduling algorithm allows for a significant increase in utilization without sacrificing on performance dependability.
A performance study of sparse Cholesky factorization on INTEL iPSC/860
NASA Technical Reports Server (NTRS)
Zubair, M.; Ghose, M.
1992-01-01
The problem of Cholesky factorization of a sparse matrix has been very well investigated on sequential machines. A number of efficient codes exist for factorizing large unstructured sparse matrices. However, there is a lack of such efficient codes on parallel machines in general, and distributed machines in particular. Some of the issues that are critical to the implementation of sparse Cholesky factorization on a distributed memory parallel machine are ordering, partitioning and mapping, load balancing, and ordering of various tasks within a processor. Here, we focus on the effect of various partitioning schemes on the performance of sparse Cholesky factorization on the Intel iPSC/860. Also, a new partitioning heuristic for structured as well as unstructured sparse matrices is proposed, and its performance is compared with other schemes.
FEM analysis of an single stator dual PM rotors axial synchronous machine
NASA Astrophysics Data System (ADS)
Tutelea, L. N.; Deaconu, S. I.; Popa, G. N.
2017-01-01
The actual e - continuously variable transmission (e-CVT) solution for the parallel Hybrid Electric Vehicle (HEV) requires two electric machines, two inverters, and a planetary gear. A distinct electric generator and a propulsion electric motor, both with full power converters, are typical for a series HEV. In an effort to simplify the planetary-geared e-CVT for the parallel HEV or the series HEV we hereby propose to replace the basically two electric machines and their two power converters by a single, axial-air-gap, electric machine central stator, fed from a single PWM converter with dual frequency voltage output and two independent PM rotors. The proposed topologies, the magneto-motive force analysis and quasi 3D-FEM analysis are the core of the paper.
Productive High Performance Parallel Programming with Auto-tuned Domain-Specific Embedded Languages
2013-01-02
Compilation JVM Java Virtual Machine KB Kilobyte KDT Knowledge Discovery Toolbox LAPACK Linear Algebra Package LLVM Low-Level Virtual Machine LOC Lines...different starting points. Leo Meyerovich also helped solidify some of the ideas here in discussions during Par Lab retreats. I would also like to thank...multi-timestep computations by blocking in both time and space. 88 Implementation Output Approx DSL Type Language Language Parallelism LoC Graphite
Machining heavy plastic sections
NASA Technical Reports Server (NTRS)
Stalkup, O. M.
1967-01-01
Machining technique produces consistently satisfactory plane-parallel optical surfaces for pressure windows, made of plexiglass, required to support a photographic study of liquid rocket combustion processes. The surfaces are machined and polished to the required tolerances and show no degradation from stress relaxation over periods as long as 6 months.
Parallel processors and nonlinear structural dynamics algorithms and software
NASA Technical Reports Server (NTRS)
Belytschko, Ted; Gilbertsen, Noreen D.; Neal, Mark O.; Plaskacz, Edward J.
1989-01-01
The adaptation of a finite element program with explicit time integration to a massively parallel SIMD (single instruction multiple data) computer, the CONNECTION Machine is described. The adaptation required the development of a new algorithm, called the exchange algorithm, in which all nodal variables are allocated to the element with an exchange of nodal forces at each time step. The architectural and C* programming language features of the CONNECTION Machine are also summarized. Various alternate data structures and associated algorithms for nonlinear finite element analysis are discussed and compared. Results are presented which demonstrate that the CONNECTION Machine is capable of outperforming the CRAY XMP/14.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Strout, Michelle
Programming parallel machines is fraught with difficulties: the obfuscation of algorithms due to implementation details such as communication and synchronization, the need for transparency between language constructs and performance, the difficulty of performing program analysis to enable automatic parallelization techniques, and the existence of important "dusty deck" codes. The SAIMI project developed abstractions that enable the orthogonal specification of algorithms and implementation details within the context of existing DOE applications. The main idea is to enable the injection of small programming models such as expressions involving transcendental functions, polyhedral iteration spaces with sparse constraints, and task graphs into full programsmore » through the use of pragmas. These smaller, more restricted programming models enable orthogonal specification of many implementation details such as how to map the computation on to parallel processors, how to schedule the computation, and how to allocation storage for the computation. At the same time, these small programming models enable the expression of the most computationally intense and communication heavy portions in many scientific simulations. The ability to orthogonally manipulate the implementation for such computations will significantly ease performance programming efforts and expose transformation possibilities and parameter to automated approaches such as autotuning. At Colorado State University, the SAIMI project was supported through DOE grant DE-SC3956 from April 2010 through August 2015. The SAIMI project has contributed a number of important results to programming abstractions that enable the orthogonal specification of implementation details in scientific codes. This final report summarizes the research that was funded by the SAIMI project.« less
Interactive Parallel Data Analysis within Data-Centric Cluster Facilities using the IPython Notebook
NASA Astrophysics Data System (ADS)
Pascoe, S.; Lansdowne, J.; Iwi, A.; Stephens, A.; Kershaw, P.
2012-12-01
The data deluge is making traditional analysis workflows for many researchers obsolete. Support for parallelism within popular tools such as matlab, IDL and NCO is not well developed and rarely used. However parallelism is necessary for processing modern data volumes on a timescale conducive to curiosity-driven analysis. Furthermore, for peta-scale datasets such as the CMIP5 archive, it is no longer practical to bring an entire dataset to a researcher's workstation for analysis, or even to their institutional cluster. Therefore, there is an increasing need to develop new analysis platforms which both enable processing at the point of data storage and which provides parallelism. Such an environment should, where possible, maintain the convenience and familiarity of our current analysis environments to encourage curiosity-driven research. We describe how we are combining the interactive python shell (IPython) with our JASMIN data-cluster infrastructure. IPython has been specifically designed to bridge the gap between the HPC-style parallel workflows and the opportunistic curiosity-driven analysis usually carried out using domain specific languages and scriptable tools. IPython offers a web-based interactive environment, the IPython notebook, and a cluster engine for parallelism all underpinned by the well-respected Python/Scipy scientific programming stack. JASMIN is designed to support the data analysis requirements of the UK and European climate and earth system modeling community. JASMIN, with its sister facility CEMS focusing the earth observation community, has 4.5 PB of fast parallel disk storage alongside over 370 computing cores provide local computation. Through the IPython interface to JASMIN, users can make efficient use of JASMIN's multi-core virtual machines to perform interactive analysis on all cores simultaneously or can configure IPython clusters across multiple VMs. Larger-scale clusters can be provisioned through JASMIN's batch scheduling system. Outputs can be summarised and visualised using the full power of Python's many scientific tools, including Scipy, Matplotlib, Pandas and CDAT. This rich user experience is delivered through the user's web browser; maintaining the interactive feel of a workstation-based environment with the parallel power of a remote data-centric processing facility.
Implementation and analysis of a Navier-Stokes algorithm on parallel computers
NASA Technical Reports Server (NTRS)
Fatoohi, Raad A.; Grosch, Chester E.
1988-01-01
The results of the implementation of a Navier-Stokes algorithm on three parallel/vector computers are presented. The object of this research is to determine how well, or poorly, a single numerical algorithm would map onto three different architectures. The algorithm is a compact difference scheme for the solution of the incompressible, two-dimensional, time-dependent Navier-Stokes equations. The computers were chosen so as to encompass a variety of architectures. They are the following: the MPP, an SIMD machine with 16K bit serial processors; Flex/32, an MIMD machine with 20 processors; and Cray/2. The implementation of the algorithm is discussed in relation to these architectures and measures of the performance on each machine are given. The basic comparison is among SIMD instruction parallelism on the MPP, MIMD process parallelism on the Flex/32, and vectorization of a serial code on the Cray/2. Simple performance models are used to describe the performance. These models highlight the bottlenecks and limiting factors for this algorithm on these architectures. Finally, conclusions are presented.
Wang, Zhaocai; Ji, Zuwen; Wang, Xiaoming; Wu, Tunhua; Huang, Wei
2017-12-01
As a promising approach to solve the computationally intractable problem, the method based on DNA computing is an emerging research area including mathematics, computer science and molecular biology. The task scheduling problem, as a well-known NP-complete problem, arranges n jobs to m individuals and finds the minimum execution time of last finished individual. In this paper, we use a biologically inspired computational model and describe a new parallel algorithm to solve the task scheduling problem by basic DNA molecular operations. In turn, we skillfully design flexible length DNA strands to represent elements of the allocation matrix, take appropriate biological experiment operations and get solutions of the task scheduling problem in proper length range with less than O(n 2 ) time complexity. Copyright © 2017. Published by Elsevier B.V.
Space shuttle system program definition. Volume 4: Cost and schedule report
NASA Technical Reports Server (NTRS)
1972-01-01
The supporting cost and schedule data for the second half of the Space Shuttle System Phase B Extension Study is summarized. The major objective for this period was to address the cost/schedule differences affecting final selection of the HO orbiter space shuttle system. The contending options under study included the following booster launch configurations: (1) series burn ballistic recoverable booster (BRB), (2) parallel burn ballistic recoverable booster (BRB), (3) series burn solid rocket motors (SRM's), and (4) parallel burn solid rocket motors (SRM's). The implications of varying payload bay sizes for the orbiter, engine type for the ballistics recoverable booster, and SRM motors for the solid booster were examined.
Data parallel sorting for particle simulation
NASA Technical Reports Server (NTRS)
Dagum, Leonardo
1992-01-01
Sorting on a parallel architecture is a communications intensive event which can incur a high penalty in applications where it is required. In the case of particle simulation, only integer sorting is necessary, and sequential implementations easily attain the minimum performance bound of O (N) for N particles. Parallel implementations, however, have to cope with the parallel sorting problem which, in addition to incurring a heavy communications cost, can make the minimun performance bound difficult to attain. This paper demonstrates how the sorting problem in a particle simulation can be reduced to a merging problem, and describes an efficient data parallel algorithm to solve this merging problem in a particle simulation. The new algorithm is shown to be optimal under conditions usual for particle simulation, and its fieldwise implementation on the Connection Machine is analyzed in detail. The new algorithm is about four times faster than a fieldwise implementation of radix sort on the Connection Machine.
20 CFR 402.165 - Fee schedule.
Code of Federal Regulations, 2011 CFR
2011-04-01
... 20 Employees' Benefits 2 2011-04-01 2011-04-01 false Fee schedule. 402.165 Section 402.165 Employees' Benefits SOCIAL SECURITY ADMINISTRATION AVAILABILITY OF INFORMATION AND RECORDS TO THE PUBLIC... costs of operating the machine, plus the actual cost of the materials used, plus charges for the time...
Reactive Scheduling in Multipurpose Batch Plants
NASA Astrophysics Data System (ADS)
Narayani, A.; Shaik, Munawar A.
2010-10-01
Scheduling is an important operation in process industries for improving resource utilization resulting in direct economic benefits. It has a two-fold objective of fulfilling customer orders within the specified time as well as maximizing the plant profit. Unexpected disturbances such as machine breakdown, arrival of rush orders and cancellation of orders affect the schedule of the plant. Reactive scheduling is generation of a new schedule which has minimum deviation from the original schedule in spite of the occurrence of unexpected events in the plant operation. Recently, Shaik & Floudas (2009) proposed a novel unified model for short-term scheduling of multipurpose batch plants using unit-specific event-based continuous time representation. In this paper, we extend the model of Shaik & Floudas (2009) to handle reactive scheduling.
A high performance parallel algorithm for 1-D FFT
DOE Office of Scientific and Technical Information (OSTI.GOV)
Agarwal, R.C.; Gustavson, F.G.; Zubair, M.
1994-12-31
In this paper the authors propose a parallel high performance FFT algorithm based on a multi-dimensional formulation. They use this to solve a commonly encountered FFT based kernel on a distributed memory parallel machine, the IBM scalable parallel system, SP1. The kernel requires a forward FFT computation of an input sequence, multiplication of the transformed data by a coefficient array, and finally an inverse FFT computation of the resultant data. They show that the multi-dimensional formulation helps in reducing the communication costs and also improves the single node performance by effectively utilizing the memory system of the node. They implementedmore » this kernel on the IBM SP1 and observed a performance of 1.25 GFLOPS on a 64-node machine.« less
A Pervasive Parallel Processing Framework for Data Visualization and Analysis at Extreme Scale
DOE Office of Scientific and Technical Information (OSTI.GOV)
Moreland, Kenneth; Geveci, Berk
2014-11-01
The evolution of the computing world from teraflop to petaflop has been relatively effortless, with several of the existing programming models scaling effectively to the petascale. The migration to exascale, however, poses considerable challenges. All industry trends infer that the exascale machine will be built using processors containing hundreds to thousands of cores per chip. It can be inferred that efficient concurrency on exascale machines requires a massive amount of concurrent threads, each performing many operations on a localized piece of data. Currently, visualization libraries and applications are based off what is known as the visualization pipeline. In the pipelinemore » model, algorithms are encapsulated as filters with inputs and outputs. These filters are connected by setting the output of one component to the input of another. Parallelism in the visualization pipeline is achieved by replicating the pipeline for each processing thread. This works well for today’s distributed memory parallel computers but cannot be sustained when operating on processors with thousands of cores. Our project investigates a new visualization framework designed to exhibit the pervasive parallelism necessary for extreme scale machines. Our framework achieves this by defining algorithms in terms of worklets, which are localized stateless operations. Worklets are atomic operations that execute when invoked unlike filters, which execute when a pipeline request occurs. The worklet design allows execution on a massive amount of lightweight threads with minimal overhead. Only with such fine-grained parallelism can we hope to fill the billions of threads we expect will be necessary for efficient computation on an exascale machine.« less
A Data Parallel Multizone Navier-Stokes Code
NASA Technical Reports Server (NTRS)
Jespersen, Dennis C.; Levit, Creon; Kwak, Dochan (Technical Monitor)
1995-01-01
We have developed a data parallel multizone compressible Navier-Stokes code on the Connection Machine CM-5. The code is set up for implicit time-stepping on single or multiple structured grids. For multiple grids and geometrically complex problems, we follow the "chimera" approach, where flow data on one zone is interpolated onto another in the region of overlap. We will describe our design philosophy and give some timing results for the current code. The design choices can be summarized as: 1. finite differences on structured grids; 2. implicit time-stepping with either distributed solves or data motion and local solves; 3. sequential stepping through multiple zones with interzone data transfer via a distributed data structure. We have implemented these ideas on the CM-5 using CMF (Connection Machine Fortran), a data parallel language which combines elements of Fortran 90 and certain extensions, and which bears a strong similarity to High Performance Fortran (HPF). One interesting feature is the issue of turbulence modeling, where the architecture of a parallel machine makes the use of an algebraic turbulence model awkward, whereas models based on transport equations are more natural. We will present some performance figures for the code on the CM-5, and consider the issues involved in transitioning the code to HPF for portability to other parallel platforms.
Ordered fast fourier transforms on a massively parallel hypercube multiprocessor
NASA Technical Reports Server (NTRS)
Tong, Charles; Swarztrauber, Paul N.
1989-01-01
Design alternatives for ordered Fast Fourier Transformation (FFT) algorithms were examined on massively parallel hypercube multiprocessors such as the Connection Machine. Particular emphasis is placed on reducing communication which is known to dominate the overall computing time. To this end, the order and computational phases of the FFT were combined, and the sequence to processor maps that reduce communication were used. The class of ordered transforms is expanded to include any FFT in which the order of the transform is the same as that of the input sequence. Two such orderings are examined, namely, standard-order and A-order which can be implemented with equal ease on the Connection Machine where orderings are determined by geometries and priorities. If the sequence has N = 2 exp r elements and the hypercube has P = 2 exp d processors, then a standard-order FFT can be implemented with d + r/2 + 1 parallel transmissions. An A-order sequence can be transformed with 2d - r/2 parallel transmissions which is r - d + 1 fewer than the standard order. A parallel method for computing the trigonometric coefficients is presented that does not use trigonometric functions or interprocessor communication. A performance of 0.9 GFLOPS was obtained for an A-order transform on the Connection Machine.
Social energy: mining energy from the society
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhang, Jun Jason; Gao, David Wenzhong; Zhang, Yingchen
The inherent nature of energy, i.e., physicality, sociality and informatization, implies the inevitable and intensive interaction between energy systems and social systems. From this perspective, we define 'social energy' as a complex sociotechnical system of energy systems, social systems and the derived artificial virtual systems which characterize the intense intersystem and intra-system interactions. The recent advancement in intelligent technology, including artificial intelligence and machine learning technologies, sensing and communication in Internet of Things technologies, and massive high performance computing and extreme-scale data analytics technologies, enables the possibility of substantial advancement in socio-technical system optimization, scheduling, control and management. In thismore » paper, we provide a discussion on the nature of energy, and then propose the concept and intention of social energy systems for electrical power. A general methodology of establishing and investigating social energy is proposed, which is based on the ACP approach, i.e., 'artificial systems' (A), 'computational experiments' (C) and 'parallel execution' (P), and parallel system methodology. A case study on the University of Denver (DU) campus grid is provided and studied to demonstrate the social energy concept. In the concluding remarks, we discuss the technical pathway, in both social and nature sciences, to social energy, and our vision on its future.« less
1984-06-29
sheet metal, machined and composite parts and assembling the components into final pruJucts o Planning, evaluating, testing, inspecting and...Research showed that current programs were pursuing the design and demonstration of integrated centers for sheet metal, machining and composite ...determine any metal parts required and to schedule these requirements from the machining center. Figure 3-33, Planned Composite Production, shows
DOE Office of Scientific and Technical Information (OSTI.GOV)
Boman, Erik G.
This LDRD project was a campus exec fellowship to fund (in part) Donald Nguyen’s PhD research at UT-Austin. His work has focused on parallel programming models, and scheduling irregular algorithms on shared-memory systems using the Galois framework. Galois provides a simple but powerful way for users and applications to automatically obtain good parallel performance using certain supported data containers. The naïve user can write serial code, while advanced users can optimize performance by advanced features, such as specifying the scheduling policy. Galois was used to parallelize two sparse matrix reordering schemes: RCM and Sloan. Such reordering is important in high-performancemore » computing to obtain better data locality and thus reduce run times.« less
Conceptual Study of Permanent Magnet Machine Ship Propulsion Systems
1977-12-01
cycloconverter subsystem is designed using advanced thyristors and can be either water or air cooled. The machine-cycloconverter, many-phase or parallel...Turnb, Phase, Poles, Air Gap ................................. 3-9 3-5 Machine Characteristics Versus Number of Poles (large machine, 40 000 hp). Poles...cylindrical permanent magnet generator forces the power conditioner to provide for both frequency change and voltage control. The complexity of this dual
Pre-resistance-welding resistance check
Destefan, Dennis E.; Stompro, David A.
1991-01-01
A preweld resistance check for resistance welding machines uses an open circuited measurement to determine the welding machine resistance, a closed circuit measurement to determine the parallel resistance of a workpiece set and the machine, and a calculation to determine the resistance of the workpiece set. Any variation in workpiece set or machine resistance is an indication that the weld may be different from a control weld.
Advances in Parallelization for Large Scale Oct-Tree Mesh Generation
NASA Technical Reports Server (NTRS)
O'Connell, Matthew; Karman, Steve L.
2015-01-01
Despite great advancements in the parallelization of numerical simulation codes over the last 20 years, it is still common to perform grid generation in serial. Generating large scale grids in serial often requires using special "grid generation" compute machines that can have more than ten times the memory of average machines. While some parallel mesh generation techniques have been proposed, generating very large meshes for LES or aeroacoustic simulations is still a challenging problem. An automated method for the parallel generation of very large scale off-body hierarchical meshes is presented here. This work enables large scale parallel generation of off-body meshes by using a novel combination of parallel grid generation techniques and a hybrid "top down" and "bottom up" oct-tree method. Meshes are generated using hardware commonly found in parallel compute clusters. The capability to generate very large meshes is demonstrated by the generation of off-body meshes surrounding complex aerospace geometries. Results are shown including a one billion cell mesh generated around a Predator Unmanned Aerial Vehicle geometry, which was generated on 64 processors in under 45 minutes.
Towards a Better Distributed Framework for Learning Big Data
2017-06-14
UNLIMITED: PB Public Release 13. SUPPLEMENTARY NOTES 14. ABSTRACT This work aimed at solving issues in distributed machine learning. The PI’s team proposed...communication load. Finally, the team proposed the parallel least-squares policy iteration (parallel LSPI) to parallelize a reinforcement policy learning. 15
Research on Production Scheduling System with Bottleneck Based on Multi-agent
NASA Astrophysics Data System (ADS)
Zhenqiang, Bao; Weiye, Wang; Peng, Wang; Pan, Quanke
Aimed at the imbalance problem of resource capacity in Production Scheduling System, this paper uses Production Scheduling System based on multi-agent which has been constructed, and combines the dynamic and autonomous of Agent; the bottleneck problem in the scheduling is solved dynamically. Firstly, this paper uses Bottleneck Resource Agent to find out the bottleneck resource in the production line, analyses the inherent mechanism of bottleneck, and describes the production scheduling process based on bottleneck resource. Bottleneck Decomposition Agent harmonizes the relationship of job's arrival time and transfer time in Bottleneck Resource Agent and Non-Bottleneck Resource Agents, therefore, the dynamic scheduling problem is simplified as the single machine scheduling of each resource which takes part in the scheduling. Finally, the dynamic real-time scheduling problem is effectively solved in Production Scheduling System.
Performance prediction: A case study using a multi-ring KSR-1 machine
NASA Technical Reports Server (NTRS)
Sun, Xian-He; Zhu, Jianping
1995-01-01
While computers with tens of thousands of processors have successfully delivered high performance power for solving some of the so-called 'grand-challenge' applications, the notion of scalability is becoming an important metric in the evaluation of parallel machine architectures and algorithms. In this study, the prediction of scalability and its application are carefully investigated. A simple formula is presented to show the relation between scalability, single processor computing power, and degradation of parallelism. A case study is conducted on a multi-ring KSR1 shared virtual memory machine. Experimental and theoretical results show that the influence of topology variation of an architecture is predictable. Therefore, the performance of an algorithm on a sophisticated, heirarchical architecture can be predicted and the best algorithm-machine combination can be selected for a given application.
Code of Federal Regulations, 2011 CFR
2011-07-01
... 41 Public Contracts and Property Management 2 2011-07-01 2007-07-01 true Requisitioning tabulating... Contracts and Property Management Federal Property Management Regulations System FEDERAL PROPERTY MANAGEMENT... electrical and mechanical contact tabulating machines, including aperture cards and copy cards. Federal...
Code of Federal Regulations, 2010 CFR
2010-07-01
... 41 Public Contracts and Property Management 2 2010-07-01 2010-07-01 true Requisitioning tabulating... Contracts and Property Management Federal Property Management Regulations System FEDERAL PROPERTY MANAGEMENT... electrical and mechanical contact tabulating machines, including aperture cards and copy cards. Federal...
5 CFR 532.279 - Special wage schedules for printing positions.
Code of Federal Regulations, 2011 CFR
2011-01-01
... Opaquer 4 Offset Press Helper 5 Bindery Machine Operator (Helper) 5 Film Assembler-Stripper (Single Flat-Single Color) 5 Platemaker (Single Color) 5 Film Assembler-Stripper (Partial and Composite Flats) 7... Cutter) 8 Bindery Machine Operator (Power Folder) 8 Film Assembler-Stripper (Multiple Flat-Multiple Color...
5 CFR 532.279 - Special wage schedules for printing positions.
Code of Federal Regulations, 2012 CFR
2012-01-01
... Opaquer 4 Offset Press Helper 5 Bindery Machine Operator (Helper) 5 Film Assembler-Stripper (Single Flat-Single Color) 5 Platemaker (Single Color) 5 Film Assembler-Stripper (Partial and Composite Flats) 7... Cutter) 8 Bindery Machine Operator (Power Folder) 8 Film Assembler-Stripper (Multiple Flat-Multiple Color...
Automated problem scheduling and reduction of synchronization delay effects
NASA Technical Reports Server (NTRS)
Saltz, Joel H.
1987-01-01
It is anticipated that in order to make effective use of many future high performance architectures, programs will have to exhibit at least a medium grained parallelism. A framework is presented for partitioning very sparse triangular systems of linear equations that is designed to produce favorable preformance results in a wide variety of parallel architectures. Efficient methods for solving these systems are of interest because: (1) they provide a useful model problem for use in exploring heuristics for the aggregation, mapping and scheduling of relatively fine grained computations whose data dependencies are specified by directed acrylic graphs, and (2) because such efficient methods can find direct application in the development of parallel algorithms for scientific computation. Simple expressions are derived that describe how to schedule computational work with varying degrees of granularity. The Encore Multimax was used as a hardware simulator to investigate the performance effects of using the partitioning techniques presented in shared memory architectures with varying relative synchronization costs.
ComprehensiveBench: a Benchmark for the Extensive Evaluation of Global Scheduling Algorithms
NASA Astrophysics Data System (ADS)
Pilla, Laércio L.; Bozzetti, Tiago C.; Castro, Márcio; Navaux, Philippe O. A.; Méhaut, Jean-François
2015-10-01
Parallel applications that present tasks with imbalanced loads or complex communication behavior usually do not exploit the underlying resources of parallel platforms to their full potential. In order to mitigate this issue, global scheduling algorithms are employed. As finding the optimal task distribution is an NP-Hard problem, identifying the most suitable algorithm for a specific scenario and comparing algorithms are not trivial tasks. In this context, this paper presents ComprehensiveBench, a benchmark for global scheduling algorithms that enables the variation of a vast range of parameters that affect performance. ComprehensiveBench can be used to assist in the development and evaluation of new scheduling algorithms, to help choose a specific algorithm for an arbitrary application, to emulate other applications, and to enable statistical tests. We illustrate its use in this paper with an evaluation of Charm++ periodic load balancers that stresses their characteristics.
NASA Astrophysics Data System (ADS)
Work, Paul R.
1991-12-01
This thesis investigates the parallelization of existing serial programs in computational electromagnetics for use in a parallel environment. Existing algorithms for calculating the radar cross section of an object are covered, and a ray-tracing code is chosen for implementation on a parallel machine. Current parallel architectures are introduced and a suitable parallel machine is selected for the implementation of the chosen ray-tracing algorithm. The standard techniques for the parallelization of serial codes are discussed, including load balancing and decomposition considerations, and appropriate methods for the parallelization effort are selected. A load balancing algorithm is modified to increase the efficiency of the application, and a high level design of the structure of the serial program is presented. A detailed design of the modifications for the parallel implementation is also included, with both the high level and the detailed design specified in a high level design language called UNITY. The correctness of the design is proven using UNITY and standard logic operations. The theoretical and empirical results show that it is possible to achieve an efficient parallel application for a serial computational electromagnetic program where the characteristics of the algorithm and the target architecture critically influence the development of such an implementation.
Multi-objective group scheduling optimization integrated with preventive maintenance
NASA Astrophysics Data System (ADS)
Liao, Wenzhu; Zhang, Xiufang; Jiang, Min
2017-11-01
This article proposes a single-machine-based integration model to meet the requirements of production scheduling and preventive maintenance in group production. To describe the production for identical/similar and different jobs, this integrated model considers the learning and forgetting effects. Based on machine degradation, the deterioration effect is also considered. Moreover, perfect maintenance and minimal repair are adopted in this integrated model. The multi-objective of minimizing total completion time and maintenance cost is taken to meet the dual requirements of delivery date and cost. Finally, a genetic algorithm is developed to solve this optimization model, and the computation results demonstrate that this integrated model is effective and reliable.
NASA Astrophysics Data System (ADS)
Jiang, Fuhong; Zhang, Xingong; Bai, Danyu; Wu, Chin-Chia
2018-04-01
In this article, a competitive two-agent scheduling problem in a two-machine open shop is studied. The objective is to minimize the weighted sum of the makespans of two competitive agents. A complexity proof is presented for minimizing the weighted combination of the makespan of each agent if the weight α belonging to agent B is arbitrary. Furthermore, two pseudo-polynomial-time algorithms using the largest alternate processing time (LAPT) rule are presented. Finally, two approximation algorithms are presented if the weight is equal to one. Additionally, another approximation algorithm is presented if the weight is larger than one.
NASA Astrophysics Data System (ADS)
Sembiring, N.; Ginting, E.; Darnello, T.
2017-12-01
Problems that appear in a company that produces refined sugar, the production floor has not reached the level of critical machine availability because it often suffered damage (breakdown). This results in a sudden loss of production time and production opportunities. This problem can be solved by Reliability Engineering method where the statistical approach to historical damage data is performed to see the pattern of the distribution. The method can provide a value of reliability, rate of damage, and availability level, of an machine during the maintenance time interval schedule. The result of distribution test to time inter-damage data (MTTF) flexible hose component is lognormal distribution while component of teflon cone lifthing is weibull distribution. While from distribution test to mean time of improvement (MTTR) flexible hose component is exponential distribution while component of teflon cone lifthing is weibull distribution. The actual results of the flexible hose component on the replacement schedule per 720 hours obtained reliability of 0.2451 and availability 0.9960. While on the critical components of teflon cone lifthing actual on the replacement schedule per 1944 hours obtained reliability of 0.4083 and availability 0.9927.
Automated Planning and Scheduling for Space Mission Operations
NASA Technical Reports Server (NTRS)
Chien, Steve; Jonsson, Ari; Knight, Russell
2005-01-01
Research Trends: a) Finite-capacity scheduling under more complex constraints and increased problem dimensionality (subcontracting, overtime, lot splitting, inventory, etc.) b) Integrated planning and scheduling. c) Mixed-initiative frameworks. d) Management of uncertainty (proactive and reactive). e) Autonomous agent architectures and distributed production management. e) Integration of machine learning capabilities. f) Wider scope of applications: 1) analysis of supplier/buyer protocols & tradeoffs; 2) integration of strategic & tactical decision-making; and 3) enterprise integration.
NASA Technical Reports Server (NTRS)
Schreiber, Robert; Simon, Horst D.
1992-01-01
We are surveying current projects in the area of parallel supercomputers. The machines considered here will become commercially available in the 1990 - 1992 time frame. All are suitable for exploring the critical issues in applying parallel processors to large scale scientific computations, in particular CFD calculations. This chapter presents an overview of the surveyed machines, and a detailed analysis of the various architectural and technology approaches taken. Particular emphasis is placed on the feasibility of a Teraflops capability following the paths proposed by various developers.
Research on precision grinding technology of large scale and ultra thin optics
NASA Astrophysics Data System (ADS)
Zhou, Lian; Wei, Qiancai; Li, Jie; Chen, Xianhua; Zhang, Qinghua
2018-03-01
The flatness and parallelism error of large scale and ultra thin optics have an important influence on the subsequent polishing efficiency and accuracy. In order to realize the high precision grinding of those ductile elements, the low deformation vacuum chuck was designed first, which was used for clamping the optics with high supporting rigidity in the full aperture. Then the optics was planar grinded under vacuum adsorption. After machining, the vacuum system was turned off. The form error of optics was on-machine measured using displacement sensor after elastic restitution. The flatness would be convergenced with high accuracy by compensation machining, whose trajectories were integrated with the measurement result. For purpose of getting high parallelism, the optics was turned over and compensation grinded using the form error of vacuum chuck. Finally, the grinding experiment of large scale and ultra thin fused silica optics with aperture of 430mm×430mm×10mm was performed. The best P-V flatness of optics was below 3 μm, and parallelism was below 3 ″. This machining technique has applied in batch grinding of large scale and ultra thin optics.
Parallel algorithms for boundary value problems
NASA Technical Reports Server (NTRS)
Lin, Avi
1990-01-01
A general approach to solve boundary value problems numerically in a parallel environment is discussed. The basic algorithm consists of two steps: the local step where all the P available processors work in parallel, and the global step where one processor solves a tridiagonal linear system of the order P. The main advantages of this approach are two fold. First, this suggested approach is very flexible, especially in the local step and thus the algorithm can be used with any number of processors and with any of the SIMD or MIMD machines. Secondly, the communication complexity is very small and thus can be used as easily with shared memory machines. Several examples for using this strategy are discussed.
Vascular system modeling in parallel environment - distributed and shared memory approaches
Jurczuk, Krzysztof; Kretowski, Marek; Bezy-Wendling, Johanne
2011-01-01
The paper presents two approaches in parallel modeling of vascular system development in internal organs. In the first approach, new parts of tissue are distributed among processors and each processor is responsible for perfusing its assigned parts of tissue to all vascular trees. Communication between processors is accomplished by passing messages and therefore this algorithm is perfectly suited for distributed memory architectures. The second approach is designed for shared memory machines. It parallelizes the perfusion process during which individual processing units perform calculations concerning different vascular trees. The experimental results, performed on a computing cluster and multi-core machines, show that both algorithms provide a significant speedup. PMID:21550891
Parallel matrix multiplication on the Connection Machine
NASA Technical Reports Server (NTRS)
Tichy, Walter F.
1988-01-01
Matrix multiplication is a computation and communication intensive problem. Six parallel algorithms for matrix multiplication on the Connection Machine are presented and compared with respect to their performance and processor usage. For n by n matrices, the algorithms have theoretical running times of O(n to the 2nd power log n), O(n log n), O(n), and O(log n), and require n, n to the 2nd power, n to the 2nd power, and n to the 3rd power processors, respectively. With careful attention to communication patterns, the theoretically predicted runtimes can indeed be achieved in practice. The parallel algorithms illustrate the tradeoffs between performance, communication cost, and processor usage.
NASA Technical Reports Server (NTRS)
Gryphon, Coranth D.; Miller, Mark D.
1991-01-01
PCLIPS (Parallel CLIPS) is a set of extensions to the C Language Integrated Production System (CLIPS) expert system language. PCLIPS is intended to provide an environment for the development of more complex, extensive expert systems. Multiple CLIPS expert systems are now capable of running simultaneously on separate processors, or separate machines, thus dramatically increasing the scope of solvable tasks within the expert systems. As a tool for parallel processing, PCLIPS allows for an expert system to add to its fact-base information generated by other expert systems, thus allowing systems to assist each other in solving a complex problem. This allows individual expert systems to be more compact and efficient, and thus run faster or on smaller machines.
Parallel software support for computational structural mechanics
NASA Technical Reports Server (NTRS)
Jordan, Harry F.
1987-01-01
The application of the parallel programming methodology known as the Force was conducted. Two application issues were addressed. The first involves the efficiency of the implementation and its completeness in terms of satisfying the needs of other researchers implementing parallel algorithms. Support for, and interaction with, other Computational Structural Mechanics (CSM) researchers using the Force was the main issue, but some independent investigation of the Barrier construct, which is extremely important to overall performance, was also undertaken. Another efficiency issue which was addressed was that of relaxing the strong synchronization condition imposed on the self-scheduled parallel DO loop. The Force was extended by the addition of logical conditions to the cases of a parallel case construct and by the inclusion of a self-scheduled version of this construct. The second issue involved applying the Force to the parallelization of finite element codes such as those found in the NICE/SPAR testbed system. One of the more difficult problems encountered is the determination of what information in COMMON blocks is actually used outside of a subroutine and when a subroutine uses a COMMON block merely as scratch storage for internal temporary results.
Outsourcing and scheduling for a two-machine flow shop with release times
NASA Astrophysics Data System (ADS)
Ahmadizar, Fardin; Amiri, Zeinab
2018-03-01
This article addresses a two-machine flow shop scheduling problem where jobs are released intermittently and outsourcing is allowed. The first operations of outsourced jobs are processed by the first subcontractor, they are transported in batches to the second subcontractor for processing their second operations, and finally they are transported back to the manufacturer. The objective is to select a subset of jobs to be outsourced, to schedule both the in-house and the outsourced jobs, and to determine a transportation plan for the outsourced jobs so as to minimize the sum of the makespan and the outsourcing and transportation costs. Two mathematical models of the problem and several necessary optimality conditions are presented. A solution approach is then proposed by incorporating the dominance properties with an ant colony algorithm. Finally, computational experiments are conducted to evaluate the performance of the models and solution approach.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yoginath, Srikanth B; Perumalla, Kalyan S; Henz, Brian J
2012-01-01
In prior work (Yoginath and Perumalla, 2011; Yoginath, Perumalla and Henz, 2012), the motivation, challenges and issues were articulated in favor of virtual time ordering of Virtual Machines (VMs) in network simulations hosted on multi-core machines. Two major components in the overall virtualization challenge are (1) virtual timeline establishment and scheduling of VMs, and (2) virtualization of inter-VM communication. Here, we extend prior work by presenting scaling results for the first component, with experiment results on up to 128 VMs scheduled in virtual time order on a single 12-core host. We also explore the solution space of design alternatives formore » the second component, and present performance results from a multi-threaded, multi-queue implementation of inter-VM network control for synchronized execution with VM scheduling, incorporated in our NetWarp simulation system.« less
Navy Acquisition: Cost, Schedule, and Performance of New Submarine Combat Systems
1990-01-01
1985). Page 8 GAO/NSIAD-90-72 Submarine Combat Systems Chapter 1 Introduction In December 1983 the Navy awarded the International Business Machines...contracts to the General Electric Com- pany and the International Business Machines. In December 1987 the Navy selected General Electric as the prime...contractor and International Business Machines as the "follower" contractor. On March 31, 1988. the Navy awarded General Electric a $1.84 billion fixed
Parallel Algorithms for Computer Vision
1990-04-01
NA86-1, Thinking Machines Corporation, Cambridge, MA, December 1986. [43] J. Little, G. Blelloch, and T. Cass. How to program the connection machine for... to program the connection machine for computer vision. In Proc. Workshop on Comp. Architecture for Pattern Analysis and Machine Intell., 1987. [92] J...In Proceedings of SPIE Conf. on Advances in Intelligent Robotics Systems, Bellingham, VA, 1987. SPIE. [91] J. Little, G. Blelloch, and T. Cass. How
Analysis of multigrid methods on massively parallel computers: Architectural implications
NASA Technical Reports Server (NTRS)
Matheson, Lesley R.; Tarjan, Robert E.
1993-01-01
We study the potential performance of multigrid algorithms running on massively parallel computers with the intent of discovering whether presently envisioned machines will provide an efficient platform for such algorithms. We consider the domain parallel version of the standard V cycle algorithm on model problems, discretized using finite difference techniques in two and three dimensions on block structured grids of size 10(exp 6) and 10(exp 9), respectively. Our models of parallel computation were developed to reflect the computing characteristics of the current generation of massively parallel multicomputers. These models are based on an interconnection network of 256 to 16,384 message passing, 'workstation size' processors executing in an SPMD mode. The first model accomplishes interprocessor communications through a multistage permutation network. The communication cost is a logarithmic function which is similar to the costs in a variety of different topologies. The second model allows single stage communication costs only. Both models were designed with information provided by machine developers and utilize implementation derived parameters. With the medium grain parallelism of the current generation and the high fixed cost of an interprocessor communication, our analysis suggests an efficient implementation requires the machine to support the efficient transmission of long messages, (up to 1000 words) or the high initiation cost of a communication must be significantly reduced through an alternative optimization technique. Furthermore, with variable length message capability, our analysis suggests the low diameter multistage networks provide little or no advantage over a simple single stage communications network.
Batch Scheduling for Hybrid Assembly Differentiation Flow Shop to Minimize Total Actual Flow Time
NASA Astrophysics Data System (ADS)
Maulidya, R.; Suprayogi; Wangsaputra, R.; Halim, A. H.
2018-03-01
A hybrid assembly differentiation flow shop is a three-stage flow shop consisting of Machining, Assembly and Differentiation Stages and producing different types of products. In the machining stage, parts are processed in batches on different (unrelated) machines. In the assembly stage, each part of the different parts is assembled into an assembly product. Finally, the assembled products will further be processed into different types of final products in the differentiation stage. In this paper, we develop a batch scheduling model for a hybrid assembly differentiation flow shop to minimize the total actual flow time defined as the total times part spent in the shop floor from the arrival times until its due date. We also proposed a heuristic algorithm for solving the problems. The proposed algorithm is tested using a set of hypothetic data. The solution shows that the algorithm can solve the problems effectively.
NASA Astrophysics Data System (ADS)
Yusriski, R.; Sukoyo; Samadhi, T. M. A. A.; Halim, A. H.
2018-03-01
This research deals with a single machine batch scheduling model considering the influenced of learning, forgetting, and machine deterioration effects. The objective of the model is to minimize total inventory holding cost, and the decision variables are the number of batches (N), batch sizes (Q[i], i = 1, 2, .., N) and the sequence of processing the resulting batches. The parts to be processed are received at the right time and the right quantities, and all completed parts must be delivered at a common due date. We propose a heuristic procedure based on the Lagrange method to solve the problem. The effectiveness of the procedure is evaluated by comparing the resulting solution to the optimal solution obtained from the enumeration procedure using the integer composition technique and shows that the average effectiveness is 94%.
Dynamic Load Balancing for Grid Partitioning on a SP-2 Multiprocessor: A Framework
NASA Technical Reports Server (NTRS)
Sohn, Andrew; Simon, Horst; Lasinski, T. A. (Technical Monitor)
1994-01-01
Computational requirements of full scale computational fluid dynamics change as computation progresses on a parallel machine. The change in computational intensity causes workload imbalance of processors, which in turn requires a large amount of data movement at runtime. If parallel CFD is to be successful on a parallel or massively parallel machine, balancing of the runtime load is indispensable. Here a framework is presented for dynamic load balancing for CFD applications, called Jove. One processor is designated as a decision maker Jove while others are assigned to computational fluid dynamics. Processors running CFD send flags to Jove in a predetermined number of iterations to initiate load balancing. Jove starts working on load balancing while other processors continue working with the current data and load distribution. Jove goes through several steps to decide if the new data should be taken, including preliminary evaluate, partition, processor reassignment, cost evaluation, and decision. Jove running on a single EBM SP2 node has been completely implemented. Preliminary experimental results show that the Jove approach to dynamic load balancing can be effective for full scale grid partitioning on the target machine IBM SP2.
Dynamic Load Balancing For Grid Partitioning on a SP-2 Multiprocessor: A Framework
NASA Technical Reports Server (NTRS)
Sohn, Andrew; Simon, Horst; Lasinski, T. A. (Technical Monitor)
1994-01-01
Computational requirements of full scale computational fluid dynamics change as computation progresses on a parallel machine. The change in computational intensity causes workload imbalance of processors, which in turn requires a large amount of data movement at runtime. If parallel CFD is to be successful on a parallel or massively parallel machine, balancing of the runtime load is indispensable. Here a framework is presented for dynamic load balancing for CFD applications, called Jove. One processor is designated as a decision maker Jove while others are assigned to computational fluid dynamics. Processors running CFD send flags to Jove in a predetermined number of iterations to initiate load balancing. Jove starts working on load balancing while other processors continue working with the current data and load distribution. Jove goes through several steps to decide if the new data should be taken, including preliminary evaluate, partition, processor reassignment, cost evaluation, and decision. Jove running on a single IBM SP2 node has been completely implemented. Preliminary experimental results show that the Jove approach to dynamic load balancing can be effective for full scale grid partitioning on the target machine IBM SP2.
The influence of foot position on scrum kinetics during machine scrummaging.
Bayne, Helen; Kat, Cor-Jacques
2018-05-23
The purpose of this study was to investigate the effect of variations in the alignment of the feet on scrum kinetics during machine scrummaging. Twenty nine rugby forwards from amateur-level teams completed maximal scrum efforts against an instrumented scrum machine, with the feet in parallel and non-parallel positions. Three-dimensional forces, the moment about the vertical axis and sagittal plane joint angles were measured during the sustained pushing phase. There was a decrease in the magnitude of the resultant force and compression force in both of the non-parallel conditions compared to parallel and larger compression forces were associated with more extended hip and knee angles. Scrummaging with the left foot forward resulted in the lateral force being directed more towards the left and the turning moment becoming more clockwise. These directional changes were reversed when scrummaging with the right foot forward. Scrummaging with the right foot positioned ahead of the left may serve to counteract the natural clockwise wheel of the live scrum and could be used to achieve an anti-clockwise rotation of the scrum for tactical reasons. However, this would be associated with lower resultant forces and a greater lateral shear force component directed towards the right.
Techniques for cash management in scheduling manufacturing operations
NASA Astrophysics Data System (ADS)
Morady Gohareh, Mehdy; Shams Gharneh, Naser; Ghasemy Yaghin, Reza
2017-06-01
The objective in traditional scheduling is usually time based. Minimizing the makespan, total flow times, total tardi costs, etc. are instances of these objectives. In manufacturing, processing each job entails a cost paying and price receiving. Thus, the objective should include some notion of managing the flow of cash. We have defined two new objectives: maximization of average and minimum available cash. For single machine scheduling, it is demonstrated that scheduling jobs in decreasing order of profit ratios maximizes the former and improves productivity. Moreover, scheduling jobs in increasing order of costs and breaking ties in decreasing order of prices maximizes the latter and creates protection against financial instability.
SLURM: Simple Linux Utility for Resource Management
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jette, M; Grondona, M
2002-12-19
Simple Linux Utility for Resource Management (SLURM) is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for Linux clusters of thousands of nodes. Components include machine status, partition management, job management, scheduling and stream copy modules. This paper presents an overview of the SLURM architecture and functionality.
SLURM: Simplex Linux Utility for Resource Management
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jette, M; Grondona, M
2003-04-22
Simple Linux Utility for Resource Management (SLURM) is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for Linux clusters of thousands of nodes. Components include machine status, partition management, job management, scheduling, and stream copy modules. This paper presents an overview of the SLURM architecture and functionality.
49 CFR 214.531 - Schedule of repairs; general.
Code of Federal Regulations, 2010 CFR
2010-10-01
... Hi-Rail Vehicles § 214.531 Schedule of repairs; general. Except as provided in §§ 214.527(c)(5), 214.529, and 214.533, an on-track roadway maintenance machine or hi-rail vehicle that does not meet all... or hi-rail vehicle shall be placed out of on-track service. ...
Performance Evaluation in Network-Based Parallel Computing
NASA Technical Reports Server (NTRS)
Dezhgosha, Kamyar
1996-01-01
Network-based parallel computing is emerging as a cost-effective alternative for solving many problems which require use of supercomputers or massively parallel computers. The primary objective of this project has been to conduct experimental research on performance evaluation for clustered parallel computing. First, a testbed was established by augmenting our existing SUNSPARCs' network with PVM (Parallel Virtual Machine) which is a software system for linking clusters of machines. Second, a set of three basic applications were selected. The applications consist of a parallel search, a parallel sort, a parallel matrix multiplication. These application programs were implemented in C programming language under PVM. Third, we conducted performance evaluation under various configurations and problem sizes. Alternative parallel computing models and workload allocations for application programs were explored. The performance metric was limited to elapsed time or response time which in the context of parallel computing can be expressed in terms of speedup. The results reveal that the overhead of communication latency between processes in many cases is the restricting factor to performance. That is, coarse-grain parallelism which requires less frequent communication between processes will result in higher performance in network-based computing. Finally, we are in the final stages of installing an Asynchronous Transfer Mode (ATM) switch and four ATM interfaces (each 155 Mbps) which will allow us to extend our study to newer applications, performance metrics, and configurations.
Second Evaluation of Job Queuing/Scheduling Software. Phase 1
NASA Technical Reports Server (NTRS)
Jones, James Patton; Brickell, Cristy; Chancellor, Marisa (Technical Monitor)
1997-01-01
The recent proliferation of high performance workstations and the increased reliability of parallel systems have illustrated the need for robust job management systems to support parallel applications. To address this issue, NAS compiled a requirements checklist for job queuing/scheduling software. Next, NAS evaluated the leading job management system (JMS) software packages against the checklist. A year has now elapsed since the first comparison was published, and NAS has repeated the evaluation. This report describes this second evaluation, and presents the results of Phase 1: Capabilities versus Requirements. We show that JMS support for running parallel applications on clusters of workstations and parallel systems is still lacking, however, definite progress has been made by the vendors to correct the deficiencies. This report is supplemented by a WWW interface to the data collected, to aid other sites in extracting the evaluation information on specific requirements of interest.
Implementation of an ADI method on parallel computers
NASA Technical Reports Server (NTRS)
Fatoohi, Raad A.; Grosch, Chester E.
1987-01-01
The implementation of an ADI method for solving the diffusion equation on three parallel/vector computers is discussed. The computers were chosen so as to encompass a variety of architectures. They are: the MPP, an SIMD machine with 16K bit serial processors; FLEX/32, an MIMD machine with 20 processors; and CRAY/2, an MIMD machine with four vector processors. The Gaussian elimination algorithm is used to solve a set of tridiagonal systems on the FLEX/32 and CRAY/2 while the cyclic elimination algorithm is used to solve these systems on the MPP. The implementation of the method is discussed in relation to these architectures and measures of the performance on each machine are given. Simple performance models are used to describe the performance. These models highlight the bottlenecks and limiting factors for this algorithm on these architectures. Finally, conclusions are presented.
Implementation of an ADI method on parallel computers
NASA Technical Reports Server (NTRS)
Fatoohi, Raad A.; Grosch, Chester E.
1987-01-01
In this paper the implementation of an ADI method for solving the diffusion equation on three parallel/vector computers is discussed. The computers were chosen so as to encompass a variety of architectures. They are the MPP, an SIMD machine with 16-Kbit serial processors; Flex/32, an MIMD machine with 20 processors; and Cray/2, an MIMD machine with four vector processors. The Gaussian elimination algorithm is used to solve a set of tridiagonal systems on the Flex/32 and Cray/2 while the cyclic elimination algorithm is used to solve these systems on the MPP. The implementation of the method is discussed in relation to these architectures and measures of the performance on each machine are given. Simple performance models are used to describe the performance. These models highlight the bottlenecks and limiting factors for this algorithm on these architectures. Finally conclusions are presented.
Code of Federal Regulations, 2014 CFR
2014-07-01
... Conditioning/Heat Pump Equipment Domestic and commercial air conditioning and refrigeration equipment fall... cooling/heat cycle. 8415.82.00 Other, incorporating a refrigerating unit— Self-contained machines and... refrigerating or freezing equipment, electric or other; heat pumps, other than air conditioning machines of...
Code of Federal Regulations, 2011 CFR
2011-07-01
... Conditioning/Heat Pump Equipment Domestic and commercial air conditioning and refrigeration equipment fall... cooling/heat cycle. 8415.82.00 Other, incorporating a refrigerating unit— Self-contained machines and... refrigerating or freezing equipment, electric or other; heat pumps, other than air conditioning machines of...
Code of Federal Regulations, 2010 CFR
2010-07-01
... Conditioning/Heat Pump Equipment Domestic and commercial air conditioning and refrigeration equipment fall... cooling/heat cycle. 8415.82.00 Other, incorporating a refrigerating unit— Self-contained machines and... refrigerating or freezing equipment, electric or other; heat pumps, other than air conditioning machines of...
Code of Federal Regulations, 2013 CFR
2013-07-01
... Conditioning/Heat Pump Equipment Domestic and commercial air conditioning and refrigeration equipment fall... cooling/heat cycle. 8415.82.00 Other, incorporating a refrigerating unit— Self-contained machines and... refrigerating or freezing equipment, electric or other; heat pumps, other than air conditioning machines of...
Code of Federal Regulations, 2012 CFR
2012-07-01
... Conditioning/Heat Pump Equipment Domestic and commercial air conditioning and refrigeration equipment fall... cooling/heat cycle. 8415.82.00 Other, incorporating a refrigerating unit— Self-contained machines and... refrigerating or freezing equipment, electric or other; heat pumps, other than air conditioning machines of...
Efficient Execution of Microscopy Image Analysis on CPU, GPU, and MIC Equipped Cluster Systems.
Andrade, G; Ferreira, R; Teodoro, George; Rocha, Leonardo; Saltz, Joel H; Kurc, Tahsin
2014-10-01
High performance computing is experiencing a major paradigm shift with the introduction of accelerators, such as graphics processing units (GPUs) and Intel Xeon Phi (MIC). These processors have made available a tremendous computing power at low cost, and are transforming machines into hybrid systems equipped with CPUs and accelerators. Although these systems can deliver a very high peak performance, making full use of its resources in real-world applications is a complex problem. Most current applications deployed to these machines are still being executed in a single processor, leaving other devices underutilized. In this paper we explore a scenario in which applications are composed of hierarchical data flow tasks which are allocated to nodes of a distributed memory machine in coarse-grain, but each of them may be composed of several finer-grain tasks which can be allocated to different devices within the node. We propose and implement novel performance aware scheduling techniques that can be used to allocate tasks to devices. We evaluate our techniques using a pathology image analysis application used to investigate brain cancer morphology, and our experimental evaluation shows that the proposed scheduling strategies significantly outperforms other efficient scheduling techniques, such as Heterogeneous Earliest Finish Time - HEFT, in cooperative executions using CPUs, GPUs, and MICs. We also experimentally show that our strategies are less sensitive to inaccuracy in the scheduling input data and that the performance gains are maintained as the application scales.
Efficient Execution of Microscopy Image Analysis on CPU, GPU, and MIC Equipped Cluster Systems
Andrade, G.; Ferreira, R.; Teodoro, George; Rocha, Leonardo; Saltz, Joel H.; Kurc, Tahsin
2015-01-01
High performance computing is experiencing a major paradigm shift with the introduction of accelerators, such as graphics processing units (GPUs) and Intel Xeon Phi (MIC). These processors have made available a tremendous computing power at low cost, and are transforming machines into hybrid systems equipped with CPUs and accelerators. Although these systems can deliver a very high peak performance, making full use of its resources in real-world applications is a complex problem. Most current applications deployed to these machines are still being executed in a single processor, leaving other devices underutilized. In this paper we explore a scenario in which applications are composed of hierarchical data flow tasks which are allocated to nodes of a distributed memory machine in coarse-grain, but each of them may be composed of several finer-grain tasks which can be allocated to different devices within the node. We propose and implement novel performance aware scheduling techniques that can be used to allocate tasks to devices. We evaluate our techniques using a pathology image analysis application used to investigate brain cancer morphology, and our experimental evaluation shows that the proposed scheduling strategies significantly outperforms other efficient scheduling techniques, such as Heterogeneous Earliest Finish Time - HEFT, in cooperative executions using CPUs, GPUs, and MICs. We also experimentally show that our strategies are less sensitive to inaccuracy in the scheduling input data and that the performance gains are maintained as the application scales. PMID:26640423
NASA Technical Reports Server (NTRS)
Weeks, Cindy Lou
1986-01-01
Experiments were conducted at NASA Ames Research Center to define multi-tasking software requirements for multiple-instruction, multiple-data stream (MIMD) computer architectures. The focus was on specifying solutions for algorithms in the field of computational fluid dynamics (CFD). The program objectives were to allow researchers to produce usable parallel application software as soon as possible after acquiring MIMD computer equipment, to provide researchers with an easy-to-learn and easy-to-use parallel software language which could be implemented on several different MIMD machines, and to enable researchers to list preferred design specifications for future MIMD computer architectures. Analysis of CFD algorithms indicated that extensions of an existing programming language, adaptable to new computer architectures, provided the best solution to meeting program objectives. The CoFORTRAN Language was written in response to these objectives and to provide researchers a means to experiment with parallel software solutions to CFD algorithms on machines with parallel architectures.
2016-08-10
AFRL-AFOSR-JP-TR-2016-0073 Large-scale Linear Optimization through Machine Learning: From Theory to Practical System Design and Implementation ...2016 4. TITLE AND SUBTITLE Large-scale Linear Optimization through Machine Learning: From Theory to Practical System Design and Implementation 5a...performances on various machine learning tasks and it naturally lends itself to fast parallel implementations . Despite this, very little work has been
Large Scale Analysis of Geospatial Data with Dask and XArray
NASA Astrophysics Data System (ADS)
Zender, C. S.; Hamman, J.; Abernathey, R.; Evans, K. J.; Rocklin, M.; Zender, C. S.; Rocklin, M.
2017-12-01
The analysis of geospatial data with high level languages has acceleratedinnovation and the impact of existing data resources. However, as datasetsgrow beyond single-machine memory, data structures within these high levellanguages can become a bottleneck. New libraries like Dask and XArray resolve some of these scalability issues,providing interactive workflows that are both familiar tohigh-level-language researchers while also scaling out to much largerdatasets. This broadens the access of researchers to larger datasets on highperformance computers and, through interactive development, reducestime-to-insight when compared to traditional parallel programming techniques(MPI). This talk describes Dask, a distributed dynamic task scheduler, Dask.array, amulti-dimensional array that copies the popular NumPy interface, and XArray,a library that wraps NumPy/Dask.array with labeled and indexes axes,implementing the CF conventions. We discuss both the basic design of theselibraries and how they change interactive analysis of geospatial data, and alsorecent benefits and challenges of distributed computing on clusters ofmachines.
Jackin, Boaz Jessie; Watanabe, Shinpei; Ootsu, Kanemitsu; Ohkawa, Takeshi; Yokota, Takashi; Hayasaki, Yoshio; Yatagai, Toyohiko; Baba, Takanobu
2018-04-20
A parallel computation method for large-size Fresnel computer-generated hologram (CGH) is reported. The method was introduced by us in an earlier report as a technique for calculating Fourier CGH from 2D object data. In this paper we extend the method to compute Fresnel CGH from 3D object data. The scale of the computation problem is also expanded to 2 gigapixels, making it closer to real application requirements. The significant feature of the reported method is its ability to avoid communication overhead and thereby fully utilize the computing power of parallel devices. The method exhibits three layers of parallelism that favor small to large scale parallel computing machines. Simulation and optical experiments were conducted to demonstrate the workability and to evaluate the efficiency of the proposed technique. A two-times improvement in computation speed has been achieved compared to the conventional method, on a 16-node cluster (one GPU per node) utilizing only one layer of parallelism. A 20-times improvement in computation speed has been estimated utilizing two layers of parallelism on a very large-scale parallel machine with 16 nodes, where each node has 16 GPUs.
Virtual machine-based simulation platform for mobile ad-hoc network-based cyber infrastructure
Yoginath, Srikanth B.; Perumalla, Kayla S.; Henz, Brian J.
2015-09-29
In modeling and simulating complex systems such as mobile ad-hoc networks (MANETs) in de-fense communications, it is a major challenge to reconcile multiple important considerations: the rapidity of unavoidable changes to the software (network layers and applications), the difficulty of modeling the critical, implementation-dependent behavioral effects, the need to sustain larger scale scenarios, and the desire for faster simulations. Here we present our approach in success-fully reconciling them using a virtual time-synchronized virtual machine(VM)-based parallel ex-ecution framework that accurately lifts both the devices as well as the network communications to a virtual time plane while retaining full fidelity. At themore » core of our framework is a scheduling engine that operates at the level of a hypervisor scheduler, offering a unique ability to execute multi-core guest nodes over multi-core host nodes in an accurate, virtual time-synchronized manner. In contrast to other related approaches that suffer from either speed or accuracy issues, our framework provides MANET node-wise scalability, high fidelity of software behaviors, and time-ordering accuracy. The design and development of this framework is presented, and an ac-tual implementation based on the widely used Xen hypervisor system is described. Benchmarks with synthetic and actual applications are used to identify the benefits of our approach. The time inaccuracy of traditional emulation methods is demonstrated, in comparison with the accurate execution of our framework verified by theoretically correct results expected from analytical models of the same scenarios. In the largest high fidelity tests, we are able to perform virtual time-synchronized simulation of 64-node VM-based full-stack, actual software behaviors of MANETs containing a mix of static and mobile (unmanned airborne vehicle) nodes, hosted on a 32-core host, with full fidelity of unmodified ad-hoc routing protocols, unmodified application executables, and user-controllable physical layer effects including inter-device wireless signal strength, reachability, and connectivity.« less
Virtual machine-based simulation platform for mobile ad-hoc network-based cyber infrastructure
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yoginath, Srikanth B.; Perumalla, Kayla S.; Henz, Brian J.
In modeling and simulating complex systems such as mobile ad-hoc networks (MANETs) in de-fense communications, it is a major challenge to reconcile multiple important considerations: the rapidity of unavoidable changes to the software (network layers and applications), the difficulty of modeling the critical, implementation-dependent behavioral effects, the need to sustain larger scale scenarios, and the desire for faster simulations. Here we present our approach in success-fully reconciling them using a virtual time-synchronized virtual machine(VM)-based parallel ex-ecution framework that accurately lifts both the devices as well as the network communications to a virtual time plane while retaining full fidelity. At themore » core of our framework is a scheduling engine that operates at the level of a hypervisor scheduler, offering a unique ability to execute multi-core guest nodes over multi-core host nodes in an accurate, virtual time-synchronized manner. In contrast to other related approaches that suffer from either speed or accuracy issues, our framework provides MANET node-wise scalability, high fidelity of software behaviors, and time-ordering accuracy. The design and development of this framework is presented, and an ac-tual implementation based on the widely used Xen hypervisor system is described. Benchmarks with synthetic and actual applications are used to identify the benefits of our approach. The time inaccuracy of traditional emulation methods is demonstrated, in comparison with the accurate execution of our framework verified by theoretically correct results expected from analytical models of the same scenarios. In the largest high fidelity tests, we are able to perform virtual time-synchronized simulation of 64-node VM-based full-stack, actual software behaviors of MANETs containing a mix of static and mobile (unmanned airborne vehicle) nodes, hosted on a 32-core host, with full fidelity of unmodified ad-hoc routing protocols, unmodified application executables, and user-controllable physical layer effects including inter-device wireless signal strength, reachability, and connectivity.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Reed, D.A.; Grunwald, D.C.
The spectrum of parallel processor designs can be divided into three sections according to the number and complexity of the processors. At one end there are simple, bit-serial processors. Any one of thee processors is of little value, but when it is coupled with many others, the aggregate computing power can be large. This approach to parallel processing can be likened to a colony of termites devouring a log. The most notable examples of this approach are the NASA/Goodyear Massively Parallel Processor, which has 16K one-bit processors, and the Thinking Machines Connection Machine, which has 64K one-bit processors. At themore » other end of the spectrum, a small number of processors, each built using the fastest available technology and the most sophisticated architecture, are combined. An example of this approach is the Cray X-MP. This type of parallel processing is akin to four woodmen attacking the log with chainsaws.« less
Solving the flexible job shop problem by hybrid metaheuristics-based multiagent model
NASA Astrophysics Data System (ADS)
Nouri, Houssem Eddine; Belkahla Driss, Olfa; Ghédira, Khaled
2018-03-01
The flexible job shop scheduling problem (FJSP) is a generalization of the classical job shop scheduling problem that allows to process operations on one machine out of a set of alternative machines. The FJSP is an NP-hard problem consisting of two sub-problems, which are the assignment and the scheduling problems. In this paper, we propose how to solve the FJSP by hybrid metaheuristics-based clustered holonic multiagent model. First, a neighborhood-based genetic algorithm (NGA) is applied by a scheduler agent for a global exploration of the search space. Second, a local search technique is used by a set of cluster agents to guide the research in promising regions of the search space and to improve the quality of the NGA final population. The efficiency of our approach is explained by the flexible selection of the promising parts of the search space by the clustering operator after the genetic algorithm process, and by applying the intensification technique of the tabu search allowing to restart the search from a set of elite solutions to attain new dominant scheduling solutions. Computational results are presented using four sets of well-known benchmark literature instances. New upper bounds are found, showing the effectiveness of the presented approach.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yang, C.
Almost every computer architect dreams of achieving high system performance with low implementation costs. A multigauge machine can reconfigure its data-path width, provide parallelism, achieve better resource utilization, and sometimes can trade computational precision for increased speed. A simple experimental method is used here to capture the main characteristics of multigauging. The measurements indicate evidence of near-optimal speedups. Adapting these ideas in designing parallel processors incurs low costs and provides flexibility. Several operational aspects of designing a multigauge machine are discussed as well. Thus, this research reports the technical, economical, and operational feasibility studies of multigauging.
NASA Technical Reports Server (NTRS)
Manohar, Mareboyana; Tilton, James C.
1994-01-01
A progressive vector quantization (VQ) compression approach is discussed which decomposes image data into a number of levels using full search VQ. The final level is losslessly compressed, enabling lossless reconstruction. The computational difficulties are addressed by implementation on a massively parallel SIMD machine. We demonstrate progressive VQ on multispectral imagery obtained from the Advanced Very High Resolution Radiometer instrument and other Earth observation image data, and investigate the trade-offs in selecting the number of decomposition levels and codebook training method.
Telescoping magnetic ball bar test gage
Bryan, J.B.
1982-03-15
A telescoping magnetic ball bar test gage for determining the accuracy of machine tools, including robots, and those measuring machines having non-disengagable servo drives which cannot be clutched out. Two gage balls are held and separated from one another by a telescoping fixture which allows them relative radial motional freedom but not relative lateral motional freedom. The telescoping fixture comprises a parallel reed flexure unit and a rigid member. One gage ball is secured by a magnetic socket knuckle assembly which fixes its center with respect to the machine being tested. The other gage ball is secured by another magnetic socket knuckle assembly which is engaged or held by the machine in such manner that the center of that ball is directed to execute a prescribed trajectory, all points of which are equidistant from the center of the fixed gage ball. As the moving ball executes its trajectory, changes in the radial distance between the centers of the two balls caused by inaccuracies in the machine are determined or measured by a linear variable differential transformer (LVDT) assembly actuated by the parallel reed flexure unit. Measurements can be quickly and easily taken for multiple trajectories about several different fixed ball locations, thereby determining the accuracy of the machine.
A compositional reservoir simulator on distributed memory parallel computers
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rame, M.; Delshad, M.
1995-12-31
This paper presents the application of distributed memory parallel computes to field scale reservoir simulations using a parallel version of UTCHEM, The University of Texas Chemical Flooding Simulator. The model is a general purpose highly vectorized chemical compositional simulator that can simulate a wide range of displacement processes at both field and laboratory scales. The original simulator was modified to run on both distributed memory parallel machines (Intel iPSC/960 and Delta, Connection Machine 5, Kendall Square 1 and 2, and CRAY T3D) and a cluster of workstations. A domain decomposition approach has been taken towards parallelization of the code. Amore » portion of the discrete reservoir model is assigned to each processor by a set-up routine that attempts a data layout as even as possible from the load-balance standpoint. Each of these subdomains is extended so that data can be shared between adjacent processors for stencil computation. The added routines that make parallel execution possible are written in a modular fashion that makes the porting to new parallel platforms straight forward. Results of the distributed memory computing performance of Parallel simulator are presented for field scale applications such as tracer flood and polymer flood. A comparison of the wall-clock times for same problems on a vector supercomputer is also presented.« less
Applications of colored petri net and genetic algorithms to cluster tool scheduling
NASA Astrophysics Data System (ADS)
Liu, Tung-Kuan; Kuo, Chih-Jen; Hsiao, Yung-Chin; Tsai, Jinn-Tsong; Chou, Jyh-Horng
2005-12-01
In this paper, we propose a method, which uses Coloured Petri Net (CPN) and genetic algorithm (GA) to obtain an optimal deadlock-free schedule and to solve re-entrant problem for the flexible process of the cluster tool. The process of the cluster tool for producing a wafer usually can be classified into three types: 1) sequential process, 2) parallel process, and 3) sequential parallel process. But these processes are not economical enough to produce a variety of wafers in small volume. Therefore, this paper will propose the flexible process where the operations of fabricating wafers are randomly arranged to achieve the best utilization of the cluster tool. However, the flexible process may have deadlock and re-entrant problems which can be detected by CPN. On the other hand, GAs have been applied to find the optimal schedule for many types of manufacturing processes. Therefore, we successfully integrate CPN and GAs to obtain an optimal schedule with the deadlock and re-entrant problems for the flexible process of the cluster tool.
NASA Technical Reports Server (NTRS)
Kramer, Williams T. C.; Simon, Horst D.
1994-01-01
This tutorial proposes to be a practical guide for the uninitiated to the main topics and themes of high-performance computing (HPC), with particular emphasis to distributed computing. The intent is first to provide some guidance and directions in the rapidly increasing field of scientific computing using both massively parallel and traditional supercomputers. Because of their considerable potential computational power, loosely or tightly coupled clusters of workstations are increasingly considered as a third alternative to both the more conventional supercomputers based on a small number of powerful vector processors, as well as high massively parallel processors. Even though many research issues concerning the effective use of workstation clusters and their integration into a large scale production facility are still unresolved, such clusters are already used for production computing. In this tutorial we will utilize the unique experience made at the NAS facility at NASA Ames Research Center. Over the last five years at NAS massively parallel supercomputers such as the Connection Machines CM-2 and CM-5 from Thinking Machines Corporation and the iPSC/860 (Touchstone Gamma Machine) and Paragon Machines from Intel were used in a production supercomputer center alongside with traditional vector supercomputers such as the Cray Y-MP and C90.
Run-time parallelization and scheduling of loops
NASA Technical Reports Server (NTRS)
Saltz, Joel H.; Mirchandaney, Ravi; Crowley, Kay
1991-01-01
Run-time methods are studied to automatically parallelize and schedule iterations of a do loop in certain cases where compile-time information is inadequate. The methods presented involve execution time preprocessing of the loop. At compile-time, these methods set up the framework for performing a loop dependency analysis. At run-time, wavefronts of concurrently executable loop iterations are identified. Using this wavefront information, loop iterations are reordered for increased parallelism. Symbolic transformation rules are used to produce: inspector procedures that perform execution time preprocessing, and executors or transformed versions of source code loop structures. These transformed loop structures carry out the calculations planned in the inspector procedures. Performance results are presented from experiments conducted on the Encore Multimax. These results illustrate that run-time reordering of loop indexes can have a significant impact on performance.
Mechanism to support generic collective communication across a variety of programming models
Almasi, Gheorghe [Ardsley, NY; Dozsa, Gabor [Ardsley, NY; Kumar, Sameer [White Plains, NY
2011-07-19
A system and method for supporting collective communications on a plurality of processors that use different parallel programming paradigms, in one aspect, may comprise a schedule defining one or more tasks in a collective operation, an executor that executes the task, a multisend module to perform one or more data transfer functions associated with the tasks, and a connection manager that controls one or more connections and identifies an available connection. The multisend module uses the available connection in performing the one or more data transfer functions. A plurality of processors that use different parallel programming paradigms can use a common implementation of the schedule module, the executor module, the connection manager and the multisend module via a language adaptor specific to a parallel programming paradigm implemented on a processor.
ProperCAD: A portable object-oriented parallel environment for VLSI CAD
NASA Technical Reports Server (NTRS)
Ramkumar, Balkrishna; Banerjee, Prithviraj
1993-01-01
Most parallel algorithms for VLSI CAD proposed to date have one important drawback: they work efficiently only on machines that they were designed for. As a result, algorithms designed to date are dependent on the architecture for which they are developed and do not port easily to other parallel architectures. A new project under way to address this problem is described. A Portable object-oriented parallel environment for CAD algorithms (ProperCAD) is being developed. The objectives of this research are (1) to develop new parallel algorithms that run in a portable object-oriented environment (CAD algorithms using a general purpose platform for portable parallel programming called CARM is being developed and a C++ environment that is truly object-oriented and specialized for CAD applications is also being developed); and (2) to design the parallel algorithms around a good sequential algorithm with a well-defined parallel-sequential interface (permitting the parallel algorithm to benefit from future developments in sequential algorithms). One CAD application that has been implemented as part of the ProperCAD project, flat VLSI circuit extraction, is described. The algorithm, its implementation, and its performance on a range of parallel machines are discussed in detail. It currently runs on an Encore Multimax, a Sequent Symmetry, Intel iPSC/2 and i860 hypercubes, a NCUBE 2 hypercube, and a network of Sun Sparc workstations. Performance data for other applications that were developed are provided: namely test pattern generation for sequential circuits, parallel logic synthesis, and standard cell placement.
NASA Technical Reports Server (NTRS)
Luke, Edward Allen
1993-01-01
Two algorithms capable of computing a transonic 3-D inviscid flow field about rotating machines are considered for parallel implementation. During the study of these algorithms, a significant new method of measuring the performance of parallel algorithms is developed. The theory that supports this new method creates an empirical definition of scalable parallel algorithms that is used to produce quantifiable evidence that a scalable parallel application was developed. The implementation of the parallel application and an automated domain decomposition tool are also discussed.
The OpenMP Implementation of NAS Parallel Benchmarks and its Performance
NASA Technical Reports Server (NTRS)
Jin, Hao-Qiang; Frumkin, Michael; Yan, Jerry
1999-01-01
As the new ccNUMA architecture became popular in recent years, parallel programming with compiler directives on these machines has evolved to accommodate new needs. In this study, we examine the effectiveness of OpenMP directives for parallelizing the NAS Parallel Benchmarks. Implementation details will be discussed and performance will be compared with the MPI implementation. We have demonstrated that OpenMP can achieve very good results for parallelization on a shared memory system, but effective use of memory and cache is very important.
Run-time parallelization and scheduling of loops
NASA Technical Reports Server (NTRS)
Saltz, Joel H.; Mirchandaney, Ravi; Baxter, Doug
1988-01-01
The class of problems that can be effectively compiled by parallelizing compilers is discussed. This is accomplished with the doconsider construct which would allow these compilers to parallelize many problems in which substantial loop-level parallelism is available but cannot be detected by standard compile-time analysis. We describe and experimentally analyze mechanisms used to parallelize the work required for these types of loops. In each of these methods, a new loop structure is produced by modifying the loop to be parallelized. We also present the rules by which these loop transformations may be automated in order that they be included in language compilers. The main application area of the research involves problems in scientific computations and engineering. The workload used in our experiment includes a mixture of real problems as well as synthetically generated inputs. From our extensive tests on the Encore Multimax/320, we have reached the conclusion that for the types of workloads we have investigated, self-execution almost always performs better than pre-scheduling. Further, the improvement in performance that accrues as a result of global topological sorting of indices as opposed to the less expensive local sorting, is not very significant in the case of self-execution.
Parallelization and automatic data distribution for nuclear reactor simulations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Liebrock, L.M.
1997-07-01
Detailed attempts at realistic nuclear reactor simulations currently take many times real time to execute on high performance workstations. Even the fastest sequential machine can not run these simulations fast enough to ensure that the best corrective measure is used during a nuclear accident to prevent a minor malfunction from becoming a major catastrophe. Since sequential computers have nearly reached the speed of light barrier, these simulations will have to be run in parallel to make significant improvements in speed. In physical reactor plants, parallelism abounds. Fluids flow, controls change, and reactions occur in parallel with only adjacent components directlymore » affecting each other. These do not occur in the sequentialized manner, with global instantaneous effects, that is often used in simulators. Development of parallel algorithms that more closely approximate the real-world operation of a reactor may, in addition to speeding up the simulations, actually improve the accuracy and reliability of the predictions generated. Three types of parallel architecture (shared memory machines, distributed memory multicomputers, and distributed networks) are briefly reviewed as targets for parallelization of nuclear reactor simulation. Various parallelization models (loop-based model, shared memory model, functional model, data parallel model, and a combined functional and data parallel model) are discussed along with their advantages and disadvantages for nuclear reactor simulation. A variety of tools are introduced for each of the models. Emphasis is placed on the data parallel model as the primary focus for two-phase flow simulation. Tools to support data parallel programming for multiple component applications and special parallelization considerations are also discussed.« less
Scalable computing for evolutionary genomics.
Prins, Pjotr; Belhachemi, Dominique; Möller, Steffen; Smant, Geert
2012-01-01
Genomic data analysis in evolutionary biology is becoming so computationally intensive that analysis of multiple hypotheses and scenarios takes too long on a single desktop computer. In this chapter, we discuss techniques for scaling computations through parallelization of calculations, after giving a quick overview of advanced programming techniques. Unfortunately, parallel programming is difficult and requires special software design. The alternative, especially attractive for legacy software, is to introduce poor man's parallelization by running whole programs in parallel as separate processes, using job schedulers. Such pipelines are often deployed on bioinformatics computer clusters. Recent advances in PC virtualization have made it possible to run a full computer operating system, with all of its installed software, on top of another operating system, inside a "box," or virtual machine (VM). Such a VM can flexibly be deployed on multiple computers, in a local network, e.g., on existing desktop PCs, and even in the Cloud, to create a "virtual" computer cluster. Many bioinformatics applications in evolutionary biology can be run in parallel, running processes in one or more VMs. Here, we show how a ready-made bioinformatics VM image, named BioNode, effectively creates a computing cluster, and pipeline, in a few steps. This allows researchers to scale-up computations from their desktop, using available hardware, anytime it is required. BioNode is based on Debian Linux and can run on networked PCs and in the Cloud. Over 200 bioinformatics and statistical software packages, of interest to evolutionary biology, are included, such as PAML, Muscle, MAFFT, MrBayes, and BLAST. Most of these software packages are maintained through the Debian Med project. In addition, BioNode contains convenient configuration scripts for parallelizing bioinformatics software. Where Debian Med encourages packaging free and open source bioinformatics software through one central project, BioNode encourages creating free and open source VM images, for multiple targets, through one central project. BioNode can be deployed on Windows, OSX, Linux, and in the Cloud. Next to the downloadable BioNode images, we provide tutorials online, which empower bioinformaticians to install and run BioNode in different environments, as well as information for future initiatives, on creating and building such images.
Parallel Directionally Split Solver Based on Reformulation of Pipelined Thomas Algorithm
NASA Technical Reports Server (NTRS)
Povitsky, A.
1998-01-01
In this research an efficient parallel algorithm for 3-D directionally split problems is developed. The proposed algorithm is based on a reformulated version of the pipelined Thomas algorithm that starts the backward step computations immediately after the completion of the forward step computations for the first portion of lines This algorithm has data available for other computational tasks while processors are idle from the Thomas algorithm. The proposed 3-D directionally split solver is based on the static scheduling of processors where local and non-local, data-dependent and data-independent computations are scheduled while processors are idle. A theoretical model of parallelization efficiency is used to define optimal parameters of the algorithm, to show an asymptotic parallelization penalty and to obtain an optimal cover of a global domain with subdomains. It is shown by computational experiments and by the theoretical model that the proposed algorithm reduces the parallelization penalty about two times over the basic algorithm for the range of the number of processors (subdomains) considered and the number of grid nodes per subdomain.
Abstract quantum computing machines and quantum computational logics
NASA Astrophysics Data System (ADS)
Chiara, Maria Luisa Dalla; Giuntini, Roberto; Sergioli, Giuseppe; Leporini, Roberto
2016-06-01
Classical and quantum parallelism are deeply different, although it is sometimes claimed that quantum Turing machines are nothing but special examples of classical probabilistic machines. We introduce the concepts of deterministic state machine, classical probabilistic state machine and quantum state machine. On this basis, we discuss the question: To what extent can quantum state machines be simulated by classical probabilistic state machines? Each state machine is devoted to a single task determined by its program. Real computers, however, behave differently, being able to solve different kinds of problems. This capacity can be modeled, in the quantum case, by the mathematical notion of abstract quantum computing machine, whose different programs determine different quantum state machines. The computations of abstract quantum computing machines can be linguistically described by the formulas of a particular form of quantum logic, termed quantum computational logic.
Design, development and use of the finite element machine
NASA Technical Reports Server (NTRS)
Adams, L. M.; Voigt, R. C.
1983-01-01
Some of the considerations that went into the design of the Finite Element Machine, a research asynchronous parallel computer are described. The present status of the system is also discussed along with some indication of the type of results that were obtained.
Direct Machining of Low-Loss THz Waveguide Components With an RF Choke.
Lewis, Samantha M; Nanni, Emilio A; Temkin, Richard J
2014-12-01
We present results for the successful fabrication of low-loss THz metallic waveguide components using direct machining with a CNC end mill. The approach uses a split-block machining process with the addition of an RF choke running parallel to the waveguide. The choke greatly reduces coupling to the parasitic mode of the parallel-plate waveguide produced by the split-block. This method has demonstrated loss as low as 0.2 dB/cm at 280 GHz for a copper WR-3 waveguide. It has also been used in the fabrication of 3 and 10 dB directional couplers in brass, demonstrating excellent agreement with design simulations from 240-260 GHz. The method may be adapted to structures with features on the order of 200 μm.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Roberts, D
Purpose: A unified database system was developed to allow accumulation, review and analysis of quality assurance (QA) data for measurement, treatment, imaging and simulation equipment in our department. Recording these data in a database allows a unified and structured approach to review and analysis of data gathered using commercial database tools. Methods: A clinical database was developed to track records of quality assurance operations on linear accelerators, a computed tomography (CT) scanner, high dose rate (HDR) afterloader and imaging systems such as on-board imaging (OBI) and Calypso in our department. The database was developed using Microsoft Access database and visualmore » basic for applications (VBA) programming interface. Separate modules were written for accumulation, review and analysis of daily, monthly and annual QA data. All modules were designed to use structured query language (SQL) as the basis of data accumulation and review. The SQL strings are dynamically re-written at run time. The database also features embedded documentation, storage of documents produced during QA activities and the ability to annotate all data within the database. Tests are defined in a set of tables that define test type, specific value, and schedule. Results: Daily, Monthly and Annual QA data has been taken in parallel with established procedures to test MQA. The database has been used to aggregate data across machines to examine the consistency of machine parameters and operations within the clinic for several months. Conclusion: The MQA application has been developed as an interface to a commercially available SQL engine (JET 5.0) and a standard database back-end. The MQA system has been used for several months for routine data collection.. The system is robust, relatively simple to extend and can be migrated to a commercial SQL server.« less
Automatic recognition of vector and parallel operations in a higher level language
NASA Technical Reports Server (NTRS)
Schneck, P. B.
1971-01-01
A compiler for recognizing statements of a FORTRAN program which are suited for fast execution on a parallel or pipeline machine such as Illiac-4, Star or ASC is described. The technique employs interval analysis to provide flow information to the vector/parallel recognizer. Where profitable the compiler changes scalar variables to subscripted variables. The output of the compiler is an extension to FORTRAN which shows parallel and vector operations explicitly.
The MICRO-BOSS scheduling system: Current status and future efforts
NASA Technical Reports Server (NTRS)
Sadeh, Norman M.
1992-01-01
In this paper, a micro-opportunistic approach to factory scheduling was described that closely monitors the evolution of bottlenecks during the construction of the schedule and continuously redirects search towards the bottleneck that appears to be most critical. This approach differs from earlier opportunistic approaches, as it does not require scheduling large resource subproblems or large job subproblems before revising the current scheduling strategy. This micro-opportunistic approach was implemented in the context of the MICRO-BOSS factory scheduling system. A study comparing MICRO-BOSS against a macro-opportunistic scheduler suggests that the additional flexibility of the micro-opportunistic approach to scheduling generally yields important reductions in both tardiness and inventory. Current research efforts include: adaptation of MICRO-BOSS to deal with sequence-dependent setups and development of micro-opportunistic reactive scheduling techniques that will enable the system to patch the schedule in the presence of contingencies such as machine breakdowns, raw materials arriving late, job cancellations, etc.
The QCDSP project —a status report
NASA Astrophysics Data System (ADS)
Chen, Dong; Chen, Ping; Christ, Norman; Edwards, Robert; Fleming, George; Gara, Alan; Hansen, Sten; Jung, Chulwoo; Kaehler, Adrian; Kasow, Steven; Kennedy, Anthony; Kilcup, Gregory; Luo, Yubin; Malureanu, Catalin; Mawhinney, Robert; Parsons, John; Sexton, James; Sui, Chengzhong; Vranas, Pavlos
1998-01-01
We give a brief overview of the massively parallel computer project underway for nearly the past four years, centered at Columbia University. A 6 Gflops and a 50 Gflops machine are presently being debugged for installation at OSU and SCRI respectively, while a 0.4 Tflops machine is under construction for Columbia and a 0.6 Tflops machine is planned for the new RIKEN Brookhaven Research Center.
SLURM: Simple Linux Utility for Resource Management
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jette, M; Dunlap, C; Garlick, J
2002-07-08
Simple Linux Utility for Resource Management (SLURM) is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for Linux clusters of thousands of nodes. Components include machine status, partition management, job management, scheduling and stream copy modules. The design also includes a scalable, general-purpose communication infrastructure. This paper presents a overview of the SLURM architecture and functionality.
Real-time Scheduling for GPUS with Applications in Advanced Automotive Systems
2015-01-01
129 3.7 Architecture of GPU tasklet scheduling infrastructure ...throughput. This disparity is even greater when we consider mobile CPUs, such as those designed by ARM. For instance, the ARM Cortex-A15 series processor as...stub library that replaces the GPGPU runtime within each virtual machine. The stub library communicates API calls to a GPGPU backend user-space daemon
Autonomous planning and scheduling on the TechSat 21 mission
NASA Technical Reports Server (NTRS)
Sherwood, R.; Chien, S.; Castano, R.; Rabideau, G.
2002-01-01
The Autonomous Sciencecraft Experiment (ASE) will fly onboard the Air Force TechSat 21 constellation of three spacecraft scheduled for launch in 2006. ASE uses onboard continuous planning, robust task and goal-based execution, model-based mode identification and reconfiguration, and onboard machine learning and pattern recognition to radically increase science return by enabling intelligent downlink selection and autonomous retargeting.
Longest jobs first algorithm in solving job shop scheduling using adaptive genetic algorithm (GA)
NASA Astrophysics Data System (ADS)
Alizadeh Sahzabi, Vahid; Karimi, Iman; Alizadeh Sahzabi, Navid; Mamaani Barnaghi, Peiman
2012-01-01
In this paper, genetic algorithm was used to solve job shop scheduling problems. One example discussed in JSSP (Job Shop Scheduling Problem) and I described how we can solve such these problems by genetic algorithm. The goal in JSSP is to gain the shortest process time. Furthermore I proposed a method to obtain best performance on performing all jobs in shortest time. The method mainly, is according to Genetic algorithm (GA) and crossing over between parents always follows the rule which the longest process is at the first in the job queue. In the other word chromosomes is suggested to sorts based on the longest processes to shortest i.e. "longest job first" says firstly look which machine contains most processing time during its performing all its jobs and that is the bottleneck. Secondly, start sort those jobs which are belonging to that specific machine descending. Based on the achieved results," longest jobs first" is the optimized status in job shop scheduling problems. In our results the accuracy would grow up to 94.7% for total processing time and the method improved 4% the accuracy of performing all jobs in the presented example.
Multiphase complete exchange on Paragon, SP2 and CS-2
NASA Technical Reports Server (NTRS)
Bokhari, Shahid H.
1995-01-01
The overhead of interprocessor communication is a major factor in limiting the performance of parallel computer systems. The complete exchange is the severest communication pattern in that it requires each processor to send a distinct message to every other processor. This pattern is at the heart of many important parallel applications. On hypercubes, multiphase complete exchange has been developed and shown to provide optimal performance over varying message sizes. Most commercial multicomputer systems do not have a hypercube interconnect. However, they use special purpose hardware and dedicated communication processors to achieve very high performance communication and can be made to emulate the hypercube quite well. Multiphase complete exchange has been implemented on three contemporary parallel architectures: the Intel Paragon, IBM SP2 and Meiko CS-2. The essential features of these machines are described and their basic interprocessor communication overheads are discussed. The performance of multiphase complete exchange is evaluated on each machine. It is shown that the theoretical ideas developed for hypercubes are also applicable in practice to these machines and that multiphase complete exchange can lead to major savings in execution time over traditional solutions.
Protocols for distributive scheduling
NASA Technical Reports Server (NTRS)
Richards, Stephen F.; Fox, Barry
1993-01-01
The increasing complexity of space operations and the inclusion of interorganizational and international groups in the planning and control of space missions lead to requirements for greater communication, coordination, and cooperation among mission schedulers. These schedulers must jointly allocate scarce shared resources among the various operational and mission oriented activities while adhering to all constraints. This scheduling environment is complicated by such factors as the presence of varying perspectives and conflicting objectives among the schedulers, the need for different schedulers to work in parallel, and limited communication among schedulers. Smooth interaction among schedulers requires the use of protocols that govern such issues as resource sharing, authority to update the schedule, and communication of updates. This paper addresses the development and characteristics of such protocols and their use in a distributed scheduling environment that incorporates computer-aided scheduling tools. An example problem is drawn from the domain of space shuttle mission planning.
Distributed project scheduling at NASA: Requirements for manual protocols and computer-based support
NASA Technical Reports Server (NTRS)
Richards, Stephen F.
1992-01-01
The increasing complexity of space operations and the inclusion of interorganizational and international groups in the planning and control of space missions lead to requirements for greater communication, coordination, and cooperation among mission schedulers. These schedulers must jointly allocate scarce shared resources among the various operational and mission oriented activities while adhering to all constraints. This scheduling environment is complicated by such factors as the presence of varying perspectives and conflicting objectives among the schedulers, the need for different schedulers to work in parallel, and limited communication among schedulers. Smooth interaction among schedulers requires the use of protocols that govern such issues as resource sharing, authority to update the schedule, and communication of updates. This paper addresses the development and characteristics of such protocols and their use in a distributed scheduling environment that incorporates computer-aided scheduling tools. An example problem is drawn from the domain of Space Shuttle mission planning.
Planning for rover opportunistic science
NASA Technical Reports Server (NTRS)
Gaines, Daniel M.; Estlin, Tara; Forest, Fisher; Chouinard, Caroline; Castano, Rebecca; Anderson, Robert C.
2004-01-01
The Mars Exploration Rover Spirit recently set a record for the furthest distance traveled in a single sol on Mars. Future planetary exploration missions are expected to use even longer drives to position rovers in areas of high scientific interest. This increase provides the potential for a large rise in the number of new science collection opportunities as the rover traverses the Martian surface. In this paper, we describe the OASIS system, which provides autonomous capabilities for dynamically identifying and pursuing these science opportunities during longrange traverses. OASIS uses machine learning and planning and scheduling techniques to address this goal. Machine learning techniques are applied to analyze data as it is collected and quickly determine new science gods and priorities on these goals. Planning and scheduling techniques are used to alter the behavior of the rover so that new science measurements can be performed while still obeying resource and other mission constraints. We will introduce OASIS and describe how planning and scheduling algorithms support opportunistic science.
Scheduling algorithms for automatic control systems for technological processes
NASA Astrophysics Data System (ADS)
Chernigovskiy, A. S.; Tsarev, R. Yu; Kapulin, D. V.
2017-01-01
Wide use of automatic process control systems and the usage of high-performance systems containing a number of computers (processors) give opportunities for creation of high-quality and fast production that increases competitiveness of an enterprise. Exact and fast calculations, control computation, and processing of the big data arrays - all of this requires the high level of productivity and, at the same time, minimum time of data handling and result receiving. In order to reach the best time, it is necessary not only to use computing resources optimally, but also to design and develop the software so that time gain will be maximal. For this purpose task (jobs or operations), scheduling techniques for the multi-machine/multiprocessor systems are applied. Some of basic task scheduling methods for the multi-machine process control systems are considered in this paper, their advantages and disadvantages come to light, and also some usage considerations, in case of the software for automatic process control systems developing, are made.
NASA Technical Reports Server (NTRS)
Sanz, J.; Pischel, K.; Hubler, D.
1992-01-01
An application for parallel computation on a combined cluster of powerful workstations and supercomputers was developed. A Parallel Virtual Machine (PVM) is used as message passage language on a macro-tasking parallelization of the Aerodynamic Inverse Design and Analysis for a Full Engine computer code. The heterogeneous nature of the cluster is perfectly handled by the controlling host machine. Communication is established via Ethernet with the TCP/IP protocol over an open network. A reasonable overhead is imposed for internode communication, rendering an efficient utilization of the engaged processors. Perhaps one of the most interesting features of the system is its versatile nature, that permits the usage of the computational resources available that are experiencing less use at a given point in time.
A Model for Speedup of Parallel Programs
1997-01-01
Sanjeev. K Setia . The interaction between mem- ory allocation and adaptive partitioning in message- passing multicomputers. In IPPS Workshop on Job...Scheduling Strategies for Parallel Processing, pages 89{99, 1995. [15] Sanjeev K. Setia and Satish K. Tripathi. A compar- ative analysis of static
Job Shop Scheduling Focusing on Role of Buffer
NASA Astrophysics Data System (ADS)
Hino, Rei; Kusumi, Tetsuya; Yoo, Jae-Kyu; Shimizu, Yoshiaki
A scheduling problem is formulated in order to consistently manage each manufacturing resource, including machine tools, assembly robots, AGV, storehouses, material shelves, and so on. The manufacturing resources are classified into three types: producer, location, and mover. This paper focuses especially on the role of the buffer, and the differences among these types are analyzed. A unified scheduling formulation is derived from the analytical results based on the resource’s roles. Scheduling procedures based on dispatching rules are also proposed in order to numerically evaluate job shop-type production having finite buffer capacity. The influences of the capacity of bottle-necked production devices and the buffer on productivity are discussed.
Application of a hybrid generation/utility assessment heuristic to a class of scheduling problems
NASA Technical Reports Server (NTRS)
Heyward, Ann O.
1989-01-01
A two-stage heuristic solution approach for a class of multiobjective, n-job, 1-machine scheduling problems is described. Minimization of job-to-job interference for n jobs is sought. The first stage generates alternative schedule sequences by interchanging pairs of schedule elements. The set of alternative sequences can represent nodes of a decision tree; each node is reached via decision to interchange job elements. The second stage selects the parent node for the next generation of alternative sequences through automated paired comparison of objective performance for all current nodes. An application of the heuristic approach to communications satellite systems planning is presented.
NASA Astrophysics Data System (ADS)
Neff, John A.
1989-12-01
Experiments originating from Gestalt psychology have shown that representing information in a symbolic form provides a more effective means to understanding. Computer scientists have been struggling for the last two decades to determine how best to create, manipulate, and store collections of symbolic structures. In the past, much of this struggling led to software innovations because that was the path of least resistance. For example, the development of heuristics for organizing the searching through knowledge bases was much less expensive than building massively parallel machines that could search in parallel. That is now beginning to change with the emergence of parallel architectures which are showing the potential for handling symbolic structures. This paper will review the relationships between symbolic computing and parallel computing architectures, and will identify opportunities for optics to significantly impact the performance of such computing machines. Although neural networks are an exciting subset of massively parallel computing structures, this paper will not touch on this area since it is receiving a great deal of attention in the literature. That is, the concepts presented herein do not consider the distributed representation of knowledge.
File-access characteristics of parallel scientific workloads
NASA Technical Reports Server (NTRS)
Nieuwejaar, Nils; Kotz, David; Purakayastha, Apratim; Best, Michael; Ellis, Carla Schlatter
1995-01-01
Phenomenal improvements in the computational performance of multiprocessors have not been matched by comparable gains in I/O system performance. This imbalance has resulted in I/O becoming a significant bottleneck for many scientific applications. One key to overcoming this bottleneck is improving the performance of parallel file systems. The design of a high-performance parallel file system requires a comprehensive understanding of the expected workload. Unfortunately, until recently, no general workload studies of parallel file systems have been conducted. The goal of the CHARISMA project was to remedy this problem by characterizing the behavior of several production workloads, on different machines, at the level of individual reads and writes. The first set of results from the CHARISMA project describe the workloads observed on an Intel iPSC/860 and a Thinking Machines CM-5. This paper is intended to compare and contrast these two workloads for an understanding of their essential similarities and differences, isolating common trends and platform-dependent variances. Using this comparison, we are able to gain more insight into the general principles that should guide parallel file-system design.
Dust Dynamics in Protoplanetary Disks: Parallel Computing with PVM
NASA Astrophysics Data System (ADS)
de La Fuente Marcos, Carlos; Barge, Pierre; de La Fuente Marcos, Raúl
2002-03-01
We describe a parallel version of our high-order-accuracy particle-mesh code for the simulation of collisionless protoplanetary disks. We use this code to carry out a massively parallel, two-dimensional, time-dependent, numerical simulation, which includes dust particles, to study the potential role of large-scale, gaseous vortices in protoplanetary disks. This noncollisional problem is easy to parallelize on message-passing multicomputer architectures. We performed the simulations on a cache-coherent nonuniform memory access Origin 2000 machine, using both the parallel virtual machine (PVM) and message-passing interface (MPI) message-passing libraries. Our performance analysis suggests that, for our problem, PVM is about 25% faster than MPI. Using PVM and MPI made it possible to reduce CPU time and increase code performance. This allows for simulations with a large number of particles (N ~ 105-106) in reasonable CPU times. The performances of our implementation of the pa! rallel code on an Origin 2000 supercomputer are presented and discussed. They exhibit very good speedup behavior and low load unbalancing. Our results confirm that giant gaseous vortices can play a dominant role in giant planet formation.
Proceedings of the Expert Systems Workshop Held in Pacific Grove, California on 16-18 April 1986
1986-04-18
13- NUMBER OF PAGES 197 N IS. SECURITY CLASS, (ol Mm raport) UNCLASSIFIED I5a. DECLASSIFI CATION/DOWNGRADING SCHEDULE 16. DISTRIBUTION...are distributed and parallel. * - Features unimplemented at present; scheduled for phase 2. Table 1-1: Key design characteristics of ABE 2. a...data structuring techniques and a semi- deterministic scheduler . A program for the DF framework consists of a number of independent processing modules
Telescoping magnetic ball bar test gage
Bryan, J.B.
1984-03-13
A telescoping magnetic ball bar test gage for determining the accuracy of machine tools, including robots, and those measuring machines having non-disengageable servo drives which cannot be clutched out is disclosed. Two gage balls are held and separated from one another by a telescoping fixture which allows them relative radial motional freedom but not relative lateral motional freedom. The telescoping fixture comprises a parallel reed flexure unit and a rigid member. One gage ball is secured by a magnetic socket knuckle assembly which fixes its center with respect to the machine being tested. The other gage ball is secured by another magnetic socket knuckle assembly which is engaged or held by the machine in such manner that the center of that ball is directed to execute a prescribed trajectory, all points of which are equidistant from the center of the fixed gage ball. As the moving ball executes its trajectory, changes in the radial distance between the centers of the two balls caused by inaccuracies in the machine are determined or measured by a linear variable differential transformer (LVDT) assembly actuated by the parallel reed flexure unit. Measurements can be quickly and easily taken for multiple trajectories about several different fixed ball locations, thereby determining the accuracy of the machine. 3 figs.
A user interface for a knowledge-based planning and scheduling system
NASA Technical Reports Server (NTRS)
Mulvehill, Alice M.
1988-01-01
The objective of EMPRESS (Expert Mission Planning and Replanning Scheduling System) is to support the planning and scheduling required to prepare science and application payloads for flight aboard the US Space Shuttle. EMPRESS was designed and implemented in Zetalisp on a 3600 series Symbolics Lisp machine. Initially, EMPRESS was built as a concept demonstration system. The system has since been modified and expanded to ensure that the data have integrity. Issues underlying the design and development of the EMPRESS-I interface, results from a system usability assessment, and consequent modifications are described.
Bit-parallel arithmetic in a massively-parallel associative processor
NASA Technical Reports Server (NTRS)
Scherson, Isaac D.; Kramer, David A.; Alleyne, Brian D.
1992-01-01
A simple but powerful new architecture based on a classical associative processor model is presented. Algorithms for performing the four basic arithmetic operations both for integer and floating point operands are described. For m-bit operands, the proposed architecture makes it possible to execute complex operations in O(m) cycles as opposed to O(m exp 2) for bit-serial machines. A word-parallel, bit-parallel, massively-parallel computing system can be constructed using this architecture with VLSI technology. The operation of this system is demonstrated for the fast Fourier transform and matrix multiplication.
High-Performance Compute Infrastructure in Astronomy: 2020 Is Only Months Away
NASA Astrophysics Data System (ADS)
Berriman, B.; Deelman, E.; Juve, G.; Rynge, M.; Vöckler, J. S.
2012-09-01
By 2020, astronomy will be awash with as much as 60 PB of public data. Full scientific exploitation of such massive volumes of data will require high-performance computing on server farms co-located with the data. Development of this computing model will be a community-wide enterprise that has profound cultural and technical implications. Astronomers must be prepared to develop environment-agnostic applications that support parallel processing. The community must investigate the applicability and cost-benefit of emerging technologies such as cloud computing to astronomy, and must engage the Computer Science community to develop science-driven cyberinfrastructure such as workflow schedulers and optimizers. We report here the results of collaborations between a science center, IPAC, and a Computer Science research institute, ISI. These collaborations may be considered pathfinders in developing a high-performance compute infrastructure in astronomy. These collaborations investigated two exemplar large-scale science-driver workflow applications: 1) Calculation of an infrared atlas of the Galactic Plane at 18 different wavelengths by placing data from multiple surveys on a common plate scale and co-registering all the pixels; 2) Calculation of an atlas of periodicities present in the public Kepler data sets, which currently contain 380,000 light curves. These products have been generated with two workflow applications, written in C for performance and designed to support parallel processing on multiple environments and platforms, but with different compute resource needs: the Montage image mosaic engine is I/O-bound, and the NASA Star and Exoplanet Database periodogram code is CPU-bound. Our presentation will report cost and performance metrics and lessons-learned for continuing development. Applicability of Cloud Computing: Commercial Cloud providers generally charge for all operations, including processing, transfer of input and output data, and for storage of data, and so the costs of running applications vary widely according to how they use resources. The cloud is well suited to processing CPU-bound (and memory bound) workflows such as the periodogram code, given the relatively low cost of processing in comparison with I/O operations. I/O-bound applications such as Montage perform best on high-performance clusters with fast networks and parallel file-systems. Science-driven Cyberinfrastructure: Montage has been widely used as a driver application to develop workflow management services, such as task scheduling in distributed environments, designing fault tolerance techniques for job schedulers, and developing workflow orchestration techniques. Running Parallel Applications Across Distributed Cloud Environments: Data processing will eventually take place in parallel distributed across cyber infrastructure environments having different architectures. We have used the Pegasus Work Management System (WMS) to successfully run applications across three very different environments: TeraGrid, OSG (Open Science Grid), and FutureGrid. Provisioning resources across different grids and clouds (also referred to as Sky Computing), involves establishing a distributed environment, where issues of, e.g, remote job submission, data management, and security need to be addressed. This environment also requires building virtual machine images that can run in different environments. Usually, each cloud provides basic images that can be customized with additional software and services. In most of our work, we provisioned compute resources using a custom application, called Wrangler. Pegasus WMS abstracts the architectures of the compute environments away from the end-user, and can be considered a first-generation tool suitable for scientists to run their applications on disparate environments.
The effect of embedded bonus rounds on slot machine preference.
Belisle, Jordan; Owens, Kelti; Dixon, Mark R; Malkin, Albert; Jordan, Sam D
2017-04-01
Twenty-three university students completed a simulated slot machine task involving the concurrent presentation of two slot machines that were varied both in win density and the inclusion of a bonus round feature to evaluate the effect of embedded bonus rounds on participant response allocation. The results suggest that participants allocated a greater percentage of responses to machines with embedded bonus rounds across both dense (Bonus: M = 68.4, SD = 19.2; No Bonus: M = 51.2; 9.6) and lean (Bonus: M = 48.8, SD = 9.6; No Bonus: M = 31.6, SD = 19.2) reinforcement schedules, in which the overall reinforcement rate across all machines was held constant. © 2016 Society for the Experimental Analysis of Behavior.
Next Generation Parallelization Systems for Processing and Control of PDS Image Node Assets
NASA Astrophysics Data System (ADS)
Verma, R.
2017-06-01
We present next-generation parallelization tools to help Planetary Data System (PDS) Imaging Node (IMG) better monitor, process, and control changes to nearly 650 million file assets and over a dozen machines on which they are referenced or stored.
Block-Parallel Data Analysis with DIY2
DOE Office of Scientific and Technical Information (OSTI.GOV)
Morozov, Dmitriy; Peterka, Tom
DIY2 is a programming model and runtime for block-parallel analytics on distributed-memory machines. Its main abstraction is block-structured data parallelism: data are decomposed into blocks; blocks are assigned to processing elements (processes or threads); computation is described as iterations over these blocks, and communication between blocks is defined by reusable patterns. By expressing computation in this general form, the DIY2 runtime is free to optimize the movement of blocks between slow and fast memories (disk and flash vs. DRAM) and to concurrently execute blocks residing in memory with multiple threads. This enables the same program to execute in-core, out-of-core, serial,more » parallel, single-threaded, multithreaded, or combinations thereof. This paper describes the implementation of the main features of the DIY2 programming model and optimizations to improve performance. DIY2 is evaluated on benchmark test cases to establish baseline performance for several common patterns and on larger complete analysis codes running on large-scale HPC machines.« less
Statistical Analysis of NAS Parallel Benchmarks and LINPACK Results
NASA Technical Reports Server (NTRS)
Meuer, Hans-Werner; Simon, Horst D.; Strohmeier, Erich; Lasinski, T. A. (Technical Monitor)
1994-01-01
In the last three years extensive performance data have been reported for parallel machines both based on the NAS Parallel Benchmarks, and on LINPACK. In this study we have used the reported benchmark results and performed a number of statistical experiments using factor, cluster, and regression analyses. In addition to the performance results of LINPACK and the eight NAS parallel benchmarks, we have also included peak performance of the machine, and the LINPACK n and n(sub 1/2) values. Some of the results and observations can be summarized as follows: 1) All benchmarks are strongly correlated with peak performance. 2) LINPACK and EP have each a unique signature. 3) The remaining NPB can grouped into three groups as follows: (CG and IS), (LU and SP), and (MG, FT, and BT). Hence three (or four with EP) benchmarks are sufficient to characterize the overall NPB performance. Our poster presentation will follow a standard poster format, and will present the data of our statistical analysis in detail.
Software For Integer Programming
NASA Technical Reports Server (NTRS)
Fogle, F. R.
1992-01-01
Improved Exploratory Search Technique for Pure Integer Linear Programming Problems (IESIP) program optimizes objective function of variables subject to confining functions or constraints, using discrete optimization or integer programming. Enables rapid solution of problems up to 10 variables in size. Integer programming required for accuracy in modeling systems containing small number of components, distribution of goods, scheduling operations on machine tools, and scheduling production in general. Written in Borland's TURBO Pascal.
Address tracing for parallel machines
NASA Technical Reports Server (NTRS)
Stunkel, Craig B.; Janssens, Bob; Fuchs, W. Kent
1991-01-01
Recently implemented parallel system address-tracing methods based on several metrics are surveyed. The issues specific to collection of traces for both shared and distributed memory parallel computers are highlighted. Five general categories of address-trace collection methods are examined: hardware-captured, interrupt-based, simulation-based, altered microcode-based, and instrumented program-based traces. The problems unique to shared memory and distributed memory multiprocessors are examined separately.
Job Scheduling with Efficient Resource Monitoring in Cloud Datacenter
Loganathan, Shyamala; Mukherjee, Saswati
2015-01-01
Cloud computing is an on-demand computing model, which uses virtualization technology to provide cloud resources to users in the form of virtual machines through internet. Being an adaptable technology, cloud computing is an excellent alternative for organizations for forming their own private cloud. Since the resources are limited in these private clouds maximizing the utilization of resources and giving the guaranteed service for the user are the ultimate goal. For that, efficient scheduling is needed. This research reports on an efficient data structure for resource management and resource scheduling technique in a private cloud environment and discusses a cloud model. The proposed scheduling algorithm considers the types of jobs and the resource availability in its scheduling decision. Finally, we conducted simulations using CloudSim and compared our algorithm with other existing methods, like V-MCT and priority scheduling algorithms. PMID:26473166
NASA Technical Reports Server (NTRS)
Borse, John E.; Owens, Christopher C.
1992-01-01
Our research focuses on the problem of recovering from perturbations in large-scale schedules, specifically on the ability of a human-machine partnership to dynamically modify an airline schedule in response to unanticipated disruptions. This task is characterized by massive interdependencies and a large space of possible actions. Our approach is to apply the following: qualitative, knowledge-intensive techniques relying on a memory of stereotypical failures and appropriate recoveries; and quantitative techniques drawn from the Operations Research community's work on scheduling. Our main scientific challenge is to represent schedules, failures, and repairs so as to make both sets of techniques applicable to the same data. This paper outlines ongoing research in which we are cooperating with United Airlines to develop our understanding of the scientific issues underlying the practicalities of dynamic, real-time schedule repair.
Job Scheduling with Efficient Resource Monitoring in Cloud Datacenter.
Loganathan, Shyamala; Mukherjee, Saswati
2015-01-01
Cloud computing is an on-demand computing model, which uses virtualization technology to provide cloud resources to users in the form of virtual machines through internet. Being an adaptable technology, cloud computing is an excellent alternative for organizations for forming their own private cloud. Since the resources are limited in these private clouds maximizing the utilization of resources and giving the guaranteed service for the user are the ultimate goal. For that, efficient scheduling is needed. This research reports on an efficient data structure for resource management and resource scheduling technique in a private cloud environment and discusses a cloud model. The proposed scheduling algorithm considers the types of jobs and the resource availability in its scheduling decision. Finally, we conducted simulations using CloudSim and compared our algorithm with other existing methods, like V-MCT and priority scheduling algorithms.
Production scheduling with discrete and renewable additional resources
NASA Astrophysics Data System (ADS)
Kalinowski, K.; Grabowik, C.; Paprocka, I.; Kempa, W.
2015-11-01
In this paper an approach to planning of additional resources when scheduling operations are discussed. The considered resources are assumed to be discrete and renewable. In most research in scheduling domain, the basic and often the only type of regarded resources is a workstation. It can be understood as a machine, a device or even as a separated space on the shop floor. In many cases, during the detailed scheduling of operations the need of using more than one resource, required for its implementation, can be indicated. Resource requirements for an operation may relate to different resources or resources of the same type. Additional resources are most often referred to these human resources, tools or equipment, for which the limited availability in the manufacturing system may have an influence on the execution dates of some operations. In the paper the concept of the division into basic and additional resources and their planning method was shown. A situation in which sets of basic and additional resources are not separable - the same additional resource may be a basic resource for another operation is also considered. Scheduling of operations, including greater amount of resources can cause many difficulties, depending on whether the resource is involved in the entire time of operation, only in the selected part(s) of operation (e.g. as auxiliary staff at setup time) or cyclic - e.g. when an operator supports more than one machine, or supervises the execution of several operations. For this reason the dates and work times of resources participation in the operation can be different. Presented issues are crucial when modelling of production scheduling environment and designing of structures for the purpose of scheduling software development.
ERIC Educational Resources Information Center
GLOVER, J.H.
THE CHIEF OBJECTIVE OF THIS STUDY OF SPEED-SKILL ACQUISITION WAS TO FIND A MATHEMATICAL MODEL CAPABLE OF SIMPLE GRAPHIC INTERPRETATION FOR INDUSTRIAL TRAINING AND PRODUCTION SCHEDULING AT THE SHOP FLOOR LEVEL. STUDIES OF MIDDLE SKILL DEVELOPMENT IN MACHINE AND VEHICLE ASSEMBLY, AIRCRAFT PRODUCTION, SPOOLMAKING AND THE MACHINING OF PARTS CONFIRMED…
Verification and Planning Based on Coinductive Logic Programming
NASA Technical Reports Server (NTRS)
Bansal, Ajay; Min, Richard; Simon, Luke; Mallya, Ajay; Gupta, Gopal
2008-01-01
Coinduction is a powerful technique for reasoning about unfounded sets, unbounded structures, infinite automata, and interactive computations [6]. Where induction corresponds to least fixed point's semantics, coinduction corresponds to greatest fixed point semantics. Recently coinduction has been incorporated into logic programming and an elegant operational semantics developed for it [11, 12]. This operational semantics is the greatest fix point counterpart of SLD resolution (SLD resolution imparts operational semantics to least fix point based computations) and is termed co- SLD resolution. In co-SLD resolution, a predicate goal p( t) succeeds if it unifies with one of its ancestor calls. In addition, rational infinite terms are allowed as arguments of predicates. Infinite terms are represented as solutions to unification equations and the occurs check is omitted during the unification process. Coinductive Logic Programming (Co-LP) and Co-SLD resolution can be used to elegantly perform model checking and planning. A combined SLD and Co-SLD resolution based LP system forms the common basis for planning, scheduling, verification, model checking, and constraint solving [9, 4]. This is achieved by amalgamating SLD resolution, co-SLD resolution, and constraint logic programming [13] in a single logic programming system. Given that parallelism in logic programs can be implicitly exploited [8], complex, compute-intensive applications (planning, scheduling, model checking, etc.) can be executed in parallel on multi-core machines. Parallel execution can result in speed-ups as well as in larger instances of the problems being solved. In the remainder we elaborate on (i) how planning can be elegantly and efficiently performed under real-time constraints, (ii) how real-time systems can be elegantly and efficiently model- checked, as well as (iii) how hybrid systems can be verified in a combined system with both co-SLD and SLD resolution. Implementations of co-SLD resolution as well as preliminary implementations of the planning and verification applications have been developed [4]. Co-LP and Model Checking: The vast majority of properties that are to be verified can be classified into safety properties and liveness properties. It is well known within model checking that safety properties can be verified by reachability analysis, i.e, if a counter-example to the property exists, it can be finitely determined by enumerating all the reachable states of the Kripke structure.
Parallel and Serial Processes in Visual Search
ERIC Educational Resources Information Center
Thornton, Thomas L.; Gilden, David L.
2007-01-01
A long-standing issue in the study of how people acquire visual information centers around the scheduling and deployment of attentional resources: Is the process serial, or is it parallel? A substantial empirical effort has been dedicated to resolving this issue. However, the results remain largely inconclusive because the methodologies that have…
Parallel Volunteer Learning during Youth Programs
ERIC Educational Resources Information Center
Lesmeister, Marilyn K.; Green, Jeremy; Derby, Amy; Bothum, Candi
2012-01-01
Lack of time is a hindrance for volunteers to participate in educational opportunities, yet volunteer success in an organization is tied to the orientation and education they receive. Meeting diverse educational needs of volunteers can be a challenge for program managers. Scheduling a Volunteer Learning Track for chaperones that is parallel to a…
NASA Astrophysics Data System (ADS)
Galiatsatos, P. G.; Tennyson, J.
2012-11-01
The most time consuming step within the framework of the UK R-matrix molecular codes is that of the diagonalization of the inner region Hamiltonian matrix (IRHM). Here we present the method that we follow to speed up this step. We use shared memory machines (SMM), distributed memory machines (DMM), the OpenMP directive based parallel language, the MPI function based parallel language, the sparse matrix diagonalizers ARPACK and PARPACK, a variation for real symmetric matrices of the official coordinate sparse matrix format and finally a parallel sparse matrix-vector product (PSMV). The efficient application of the previous techniques rely on two important facts: the sparsity of the matrix is large enough (more than 98%) and in order to get back converged results we need a small only part of the matrix spectrum.
A scheme for solving the plane-plane challenge in force measurements at the nanoscale.
Siria, Alessandro; Huant, Serge; Auvert, Geoffroy; Comin, Fabio; Chevrier, Joel
2010-05-19
Non-contact interaction between two parallel flat surfaces is a central paradigm in sciences. This situation is the starting point for a wealth of different models: the capacitor description in electrostatics, hydrodynamic flow, thermal exchange, the Casimir force, direct contact study, third body confinement such as liquids or films of soft condensed matter. The control of parallelism is so demanding that no versatile single force machine in this geometry has been proposed so far. Using a combination of nanopositioning based on inertial motors, of microcrystal shaping with a focused-ion beam (FIB) and of accurate in situ and real-time control of surface parallelism with X-ray diffraction, we propose here a "gedanken" surface-force machine that should enable one to measure interactions between movable surfaces separated by gaps in the micrometer and nanometer ranges.
Proceedings on Expert Systems Workshop Held in Pacific Grove, California on 16-18 April 1986
1986-04-01
list is empty, the scheduler process is guar- As a result, fewer evaluator cycles are wasted waiting for the schedulcr process to anteed to be waiting...SECURITY CLASS. (of this report) UNCLASSIFIED ISa. DECLASSI FICATION/DOWNGRADING SCHEDULE 16. DISTRIBUTION STATEMENT (of this Report) APPROVED FOR PUBLIC...parallel. makes them easy to port to alternative new *--Features unimplemented at present; scheduled formachines,.hse2 phase 2. To cover a larger set
Institute for Defense Analysis. Annual Report 1995.
1995-01-01
staff have been involved in the community-wide development of MPI as well as in its application to specific NSA problems. 35 Parallel Groebner ...Basis Code — Symbolic Computing on Parallel Machines The Groebner basis method is a set of algorithms for reformulating very complex algebraic expres
Yue, Lei; Guan, Zailin; Saif, Ullah; Zhang, Fei; Wang, Hao
2016-01-01
Group scheduling is significant for efficient and cost effective production system. However, there exist setup times between the groups, which require to decrease it by sequencing groups in an efficient way. Current research is focused on a sequence dependent group scheduling problem with an aim to minimize the makespan in addition to minimize the total weighted tardiness simultaneously. In most of the production scheduling problems, the processing time of jobs is assumed as fixed. However, the actual processing time of jobs may be reduced due to "learning effect". The integration of sequence dependent group scheduling problem with learning effects has been rarely considered in literature. Therefore, current research considers a single machine group scheduling problem with sequence dependent setup times and learning effects simultaneously. A novel hybrid Pareto artificial bee colony algorithm (HPABC) with some steps of genetic algorithm is proposed for current problem to get Pareto solutions. Furthermore, five different sizes of test problems (small, small medium, medium, large medium, large) are tested using proposed HPABC. Taguchi method is used to tune the effective parameters of the proposed HPABC for each problem category. The performance of HPABC is compared with three famous multi objective optimization algorithms, improved strength Pareto evolutionary algorithm (SPEA2), non-dominated sorting genetic algorithm II (NSGAII) and particle swarm optimization algorithm (PSO). Results indicate that HPABC outperforms SPEA2, NSGAII and PSO and gives better Pareto optimal solutions in terms of diversity and quality for almost all the instances of the different sizes of problems.
Hybrid Metaheuristics for Solving a Fuzzy Single Batch-Processing Machine Scheduling Problem
Molla-Alizadeh-Zavardehi, S.; Tavakkoli-Moghaddam, R.; Lotfi, F. Hosseinzadeh
2014-01-01
This paper deals with a problem of minimizing total weighted tardiness of jobs in a real-world single batch-processing machine (SBPM) scheduling in the presence of fuzzy due date. In this paper, first a fuzzy mixed integer linear programming model is developed. Then, due to the complexity of the problem, which is NP-hard, we design two hybrid metaheuristics called GA-VNS and VNS-SA applying the advantages of genetic algorithm (GA), variable neighborhood search (VNS), and simulated annealing (SA) frameworks. Besides, we propose three fuzzy earliest due date heuristics to solve the given problem. Through computational experiments with several random test problems, a robust calibration is applied on the parameters. Finally, computational results on different-scale test problems are presented to compare the proposed algorithms. PMID:24883359
NASA Astrophysics Data System (ADS)
Buchner, Johannes
2011-12-01
Scheduling, the task of producing a time table for resources and tasks, is well-known to be a difficult problem the more resources are involved (a NP-hard problem). This is about to become an issue in Radio astronomy as observatories consisting of hundreds to thousands of telescopes are planned and operated. The Square Kilometre Array (SKA), which Australia and New Zealand bid to host, is aiming for scales where current approaches -- in construction, operation but also scheduling -- are insufficent. Although manual scheduling is common today, the problem is becoming complicated by the demand for (1) independent sub-arrays doing simultaneous observations, which requires the scheduler to plan parallel observations and (2) dynamic re-scheduling on changed conditions. Both of these requirements apply to the SKA, especially in the construction phase. We review the scheduling approaches taken in the astronomy literature, as well as investigate techniques from human schedulers and today's observatories. The scheduling problem is specified in general for scientific observations and in particular on radio telescope arrays. Also taken into account is the fact that the observatory may be oversubscribed, requiring the scheduling problem to be integrated with a planning process. We solve this long-term scheduling problem using a time-based encoding that works in the very general case of observation scheduling. This research then compares algorithms from various approaches, including fast heuristics from CPU scheduling, Linear Integer Programming and Genetic algorithms, Branch-and-Bound enumeration schemes. Measures include not only goodness of the solution, but also scalability and re-scheduling capabilities. In conclusion, we have identified a fast and good scheduling approach that allows (re-)scheduling difficult and changing problems by combining heuristics with a Genetic algorithm using block-wise mutation operations. We are able to explain and eradicate two problems in the literature: The inability of a GA to properly improve schedules and the generation of schedules with frequent interruptions. Finally, we demonstrate the scheduling framework for several operating telescopes: (1) Dynamic re-scheduling with the AUT Warkworth 12m telescope, (2) Scheduling for the Australian Mopra 22m telescope and scheduling for the Allen Telescope Array. Furthermore, we discuss the applicability of the presented scheduling framework to the Atacama Large Millimeter/submillimeter Array (ALMA, in construction) and the SKA. In particular, during the development phase of the SKA, this dynamic, scalable scheduling framework can accommodate changing conditions.
Accelerating Large Scale Image Analyses on Parallel, CPU-GPU Equipped Systems
Teodoro, George; Kurc, Tahsin M.; Pan, Tony; Cooper, Lee A.D.; Kong, Jun; Widener, Patrick; Saltz, Joel H.
2014-01-01
The past decade has witnessed a major paradigm shift in high performance computing with the introduction of accelerators as general purpose processors. These computing devices make available very high parallel computing power at low cost and power consumption, transforming current high performance platforms into heterogeneous CPU-GPU equipped systems. Although the theoretical performance achieved by these hybrid systems is impressive, taking practical advantage of this computing power remains a very challenging problem. Most applications are still deployed to either GPU or CPU, leaving the other resource under- or un-utilized. In this paper, we propose, implement, and evaluate a performance aware scheduling technique along with optimizations to make efficient collaborative use of CPUs and GPUs on a parallel system. In the context of feature computations in large scale image analysis applications, our evaluations show that intelligently co-scheduling CPUs and GPUs can significantly improve performance over GPU-only or multi-core CPU-only approaches. PMID:25419545
Efficient parallel architecture for highly coupled real-time linear system applications
NASA Technical Reports Server (NTRS)
Carroll, Chester C.; Homaifar, Abdollah; Barua, Soumavo
1988-01-01
A systematic procedure is developed for exploiting the parallel constructs of computation in a highly coupled, linear system application. An overall top-down design approach is adopted. Differential equations governing the application under consideration are partitioned into subtasks on the basis of a data flow analysis. The interconnected task units constitute a task graph which has to be computed in every update interval. Multiprocessing concepts utilizing parallel integration algorithms are then applied for efficient task graph execution. A simple scheduling routine is developed to handle task allocation while in the multiprocessor mode. Results of simulation and scheduling are compared on the basis of standard performance indices. Processor timing diagrams are developed on the basis of program output accruing to an optimal set of processors. Basic architectural attributes for implementing the system are discussed together with suggestions for processing element design. Emphasis is placed on flexible architectures capable of accommodating widely varying application specifics.
Run-time parallelization and scheduling of loops
NASA Technical Reports Server (NTRS)
Saltz, Joel H.; Mirchandaney, Ravi; Crowley, Kay
1990-01-01
Run time methods are studied to automatically parallelize and schedule iterations of a do loop in certain cases, where compile-time information is inadequate. The methods presented involve execution time preprocessing of the loop. At compile-time, these methods set up the framework for performing a loop dependency analysis. At run time, wave fronts of concurrently executable loop iterations are identified. Using this wavefront information, loop iterations are reordered for increased parallelism. Symbolic transformation rules are used to produce: inspector procedures that perform execution time preprocessing and executors or transformed versions of source code loop structures. These transformed loop structures carry out the calculations planned in the inspector procedures. Performance results are presented from experiments conducted on the Encore Multimax. These results illustrate that run time reordering of loop indices can have a significant impact on performance. Furthermore, the overheads associated with this type of reordering are amortized when the loop is executed several times with the same dependency structure.
On scheduling task systems with variable service times
NASA Astrophysics Data System (ADS)
Maset, Richard G.; Banawan, Sayed A.
1993-08-01
Several strategies have been proposed for developing optimal and near-optimal schedules for task systems (jobs consisting of multiple tasks that can be executed in parallel). Most such strategies, however, implicitly assume deterministic task service times. We show that these strategies are much less effective when service times are highly variable. We then evaluate two strategies—one adaptive, one static—that have been proposed for retaining high performance despite such variability. Both strategies are extensions of critical path scheduling, which has been found to be efficient at producing near-optimal schedules. We found the adaptive approach to be quite effective.
Multiprocessing the Sieve of Eratosthenes
NASA Technical Reports Server (NTRS)
Bokhari, S.
1986-01-01
The Sieve of Eratosthenes for finding prime numbers in recent years has seen much use as a benchmark algorithm for serial computers while its intrinsically parallel nature has gone largely unnoticed. The implementation of a parallel version of this algorithm for a real parallel computer, the Flex/32, is described and its performance discussed. It is shown that the algorithm is sensitive to several fundamental performance parameters of parallel machines, such as spawning time, signaling time, memory access, and overhead of process switching. Because of the nature of the algorithm, it is impossible to get any speedup beyond 4 or 5 processors unless some form of dynamic load balancing is employed. We describe the performance of our algorithm with and without load balancing and compare it with theoretical lower bounds and simulated results. It is straightforward to understand this algorithm and to check the final results. However, its efficient implementation on a real parallel machine requires thoughtful design, especially if dynamic load balancing is desired. The fundamental operations required by the algorithm are very simple: this means that the slightest overhead appears prominently in performance data. The Sieve thus serves not only as a very severe test of the capabilities of a parallel processor but is also an interesting challenge for the programmer.
PISCES: An environment for parallel scientific computation
NASA Technical Reports Server (NTRS)
Pratt, T. W.
1985-01-01
The parallel implementation of scientific computing environment (PISCES) is a project to provide high-level programming environments for parallel MIMD computers. Pisces 1, the first of these environments, is a FORTRAN 77 based environment which runs under the UNIX operating system. The Pisces 1 user programs in Pisces FORTRAN, an extension of FORTRAN 77 for parallel processing. The major emphasis in the Pisces 1 design is in providing a carefully specified virtual machine that defines the run-time environment within which Pisces FORTRAN programs are executed. Each implementation then provides the same virtual machine, regardless of differences in the underlying architecture. The design is intended to be portable to a variety of architectures. Currently Pisces 1 is implemented on a network of Apollo workstations and on a DEC VAX uniprocessor via simulation of the task level parallelism. An implementation for the Flexible Computing Corp. FLEX/32 is under construction. An introduction to the Pisces 1 virtual computer and the FORTRAN 77 extensions is presented. An example of an algorithm for the iterative solution of a system of equations is given. The most notable features of the design are the provision for several granularities of parallelism in programs and the provision of a window mechanism for distributed access to large arrays of data.
A Survey of Parallel Computing
1988-07-01
Evaluating Two Massively Parallel Machines. Communications of the ACM .9, , , 176 BIBLIOGRAPHY 29, 8 (August), pp. 752-758. Gajski , D.D., Padua, D.A., Kuck...Computer Architecture, edited by Gajski , D. D., Milutinovic, V. M. Siegel, H. J. and Furht, B. P. IEEE Computer Society Press, Washington, D.C., pp. 387-407
Molecular Symmetry in Ab Initio Calculations
NASA Astrophysics Data System (ADS)
Madhavan, P. V.; Written, J. L.
1987-05-01
A scheme is presented for the construction of the Fock matrix in LCAO-SCF calculations and for the transformation of basis integrals to LCAO-MO integrals that can utilize several symmetry unique lists of integrals corresponding to different symmetry groups. The algorithm is fully compatible with vector processing machines and is especially suited for parallel processing machines.
1980-09-01
SECURITY CLA,$S (of this report) Unclassified 15a. DECLASSI FICATION/ DOWNGRADING SCHEDULE 16. DISTRIEBJTiON. STATEMENIT (of this Report) Approved for...evaluation plan sketch is essentially a preliminary schedule out- lining the order by day and time of day that the ARTEP missions previously selected (see...individual plans, which parallel the normal coordinating staff functions, are as follows: * The Schedule of Events - includes a list of major missions
NASA Technical Reports Server (NTRS)
Phillips, Jennifer K.
1995-01-01
Two of the current and most popular implementations of the Message-Passing Standard, Message Passing Interface (MPI), were contrasted: MPICH by Argonne National Laboratory, and LAM by the Ohio Supercomputer Center at Ohio State University. A parallel skyline matrix solver was adapted to be run in a heterogeneous environment using MPI. The Message-Passing Interface Forum was held in May 1994 which lead to a specification of library functions that implement the message-passing model of parallel communication. LAM, which creates it's own environment, is more robust in a highly heterogeneous network. MPICH uses the environment native to the machine architecture. While neither of these free-ware implementations provides the performance of native message-passing or vendor's implementations, MPICH begins to approach that performance on the SP-2. The machines used in this study were: IBM RS6000, 3 Sun4, SGI, and the IBM SP-2. Each machine is unique and a few machines required specific modifications during the installation. When installed correctly, both implementations worked well with only minor problems.
ERIC Educational Resources Information Center
Kennedy, Mike
2003-01-01
Describes how facilities-management systems use technology to help schools and universities operate their buildings more efficiently, reduce energy consumption, manage inventory more accurately, keep track of supplies and maintenance schedules, and save money. (EV)
Telescoping magnetic ball bar test gage
Bryan, James B.
1984-01-01
A telescoping magnetic ball bar test gage for determining the accuracy of machine tools, including robots, and those measuring machines having non-disengageable servo drives which cannot be clutched out. Two gage balls (10, 12) are held and separated from one another by a telescoping fixture which allows them relative radial motional freedom but not relative lateral motional freedom. The telescoping fixture comprises a parallel reed flexure unit (14) and a rigid member (16, 18, 20, 22, 24). One gage ball (10) is secured by a magnetic socket knuckle assembly (34) which fixes its center with respect to the machine being tested. The other gage ball (12) is secured by another magnetic socket knuckle assembly (38) which is engaged or held by the machine in such manner that the center of that ball (12) is directed to execute a prescribed trajectory, all points of which are equidistant from the center of the fixed gage ball (10). As the moving ball (12) executes its trajectory, changes in the radial distance between the centers of the two balls (10, 12) caused by inaccuracies in the machine are determined or measured by a linear variable differential transformer (LVDT) assembly (50, 52, 54, 56, 58, 60) actuated by the parallel reed flexure unit (14). Measurements can be quickly and easily taken for multiple trajectories about several different fixed ball (10) locations, thereby determining the accuracy of the machine.
The checkpoint ordering problem
Hungerländer, P.
2017-01-01
Abstract We suggest a new variant of a row layout problem: Find an ordering of n departments with given lengths such that the total weighted sum of their distances to a given checkpoint is minimized. The Checkpoint Ordering Problem (COP) is both of theoretical and practical interest. It has several applications and is conceptually related to some well-studied combinatorial optimization problems, namely the Single-Row Facility Layout Problem, the Linear Ordering Problem and a variant of parallel machine scheduling. In this paper we study the complexity of the (COP) and its special cases. The general version of the (COP) with an arbitrary but fixed number of checkpoints is NP-hard in the weak sense. We propose both a dynamic programming algorithm and an integer linear programming approach for the (COP) . Our computational experiments indicate that the (COP) is hard to solve in practice. While the run time of the dynamic programming algorithm strongly depends on the length of the departments, the integer linear programming approach is able to solve instances with up to 25 departments to optimality. PMID:29170574
Cooperating Expert Systems For Space Station Power Distribution Management
NASA Astrophysics Data System (ADS)
Nguyen, T. A.; Chiou, W. C.
1987-02-01
In a complex system such as the manned Space Station, it is deem necessary that many expert systems must perform tasks in a concurrent and cooperative manner. An important question arise is: what cooperative-task-performing models are appropriate for multiple expert systems to jointly perform tasks. The solution to this question will provide a crucial automation design criteria for the Space Station complex systems architecture. Based on a client/server model for performing tasks, we have developed a system that acts as a front-end to support loosely-coupled communications between expert systems running on multiple Symbolics machines. As an example, we use two ART*-based expert systems to demonstrate the concept of parallel symbolic manipulation for power distribution management and dynamic load planner/scheduler in the simulated Space Station environment. This on-going work will also explore other cooperative-task-performing models as alternatives which can evaluate inter and intra expert system communication mechanisms. It will be served as a testbed and a bench-marking tool for other Space Station expert subsystem communication and information exchange.
NASA Astrophysics Data System (ADS)
Jia, Zhao-hong; Pei, Ming-li; Leung, Joseph Y.-T.
2017-12-01
In this paper, we investigate the batch-scheduling problem with rejection on parallel machines with non-identical job sizes and arbitrary job-rejected weights. If a job is rejected, the corresponding penalty has to be paid. Our objective is to minimise the makespan of the processed jobs and the total rejection cost of the rejected jobs. Based on the selected multi-objective optimisation approaches, two problems, P1 and P2, are considered. In P1, the two objectives are linearly combined into one single objective. In P2, the two objectives are simultaneously minimised and the Pareto non-dominated solution set is to be found. Based on the ant colony optimisation (ACO), two algorithms, called LACO and PACO, are proposed to address the two problems, respectively. Two different objective-oriented pheromone matrices and heuristic information are designed. Additionally, a local optimisation algorithm is adopted to improve the solution quality. Finally, simulated experiments are conducted, and the comparative results verify the effectiveness and efficiency of the proposed algorithms, especially on large-scale instances.
Bellos, Christos; Papadopoulos, Athanassios; Rosso, Roberto; Fotiadis, Dimitrios I
2011-01-01
CHRONIOUS system is an integrated platform aiming at the management of chronic disease patients. One of the most important components of the system is a Decision Support System (DSS) that has been developed in a Smart Device (SD). This component decides on patient's current health status by combining several data, which are acquired either by wearable sensors or manually inputted by the patient or retrieved from the specific database. In case no abnormal situation has been tracked, the DSS takes no action and remains deactivated until next abnormal situation pack of data are being acquired or next scheduled data being transmitted. The DSS that has been implemented is an integrated classification system with two parallel classifiers, combining an expert system (rule-based system) and a supervised classifier, such as Support Vector Machines (SVM), Random Forests, artificial Neural Networks (aNN like the Multi-Layer Perceptron), Decision Trees and Naïve Bayes. The above categorized system is useful for providing critical information about the health status of the patient.
Large-scale Parallel Unstructured Mesh Computations for 3D High-lift Analysis
NASA Technical Reports Server (NTRS)
Mavriplis, Dimitri J.; Pirzadeh, S.
1999-01-01
A complete "geometry to drag-polar" analysis capability for the three-dimensional high-lift configurations is described. The approach is based on the use of unstructured meshes in order to enable rapid turnaround for complicated geometries that arise in high-lift configurations. Special attention is devoted to creating a capability for enabling analyses on highly resolved grids. Unstructured meshes of several million vertices are initially generated on a work-station, and subsequently refined on a supercomputer. The flow is solved on these refined meshes on large parallel computers using an unstructured agglomeration multigrid algorithm. Good prediction of lift and drag throughout the range of incidences is demonstrated on a transport take-off configuration using up to 24.7 million grid points. The feasibility of using this approach in a production environment on existing parallel machines is demonstrated, as well as the scalability of the solver on machines using up to 1450 processors.
Communication overhead on the Intel Paragon, IBM SP2 and Meiko CS-2
NASA Technical Reports Server (NTRS)
Bokhari, Shahid H.
1995-01-01
Interprocessor communication overhead is a crucial measure of the power of parallel computing systems-its impact can severely limit the performance of parallel programs. This report presents measurements of communication overhead on three contemporary commercial multicomputer systems: the Intel Paragon, the IBM SP2 and the Meiko CS-2. In each case the time to communicate between processors is presented as a function of message length. The time for global synchronization and memory access is discussed. The performance of these machines in emulating hypercubes and executing random pairwise exchanges is also investigated. It is shown that the interprocessor communication time depends heavily on the specific communication pattern required. These observations contradict the commonly held belief that communication overhead on contemporary machines is independent of the placement of tasks on processors. The information presented in this report permits the evaluation of the efficiency of parallel algorithm implementations against standard baselines.
A system for routing arbitrary directed graphs on SIMD architectures
NASA Technical Reports Server (NTRS)
Tomboulian, Sherryl
1987-01-01
There are many problems which can be described in terms of directed graphs that contain a large number of vertices where simple computations occur using data from connecting vertices. A method is given for parallelizing such problems on an SIMD machine model that is bit-serial and uses only nearest neighbor connections for communication. Each vertex of the graph will be assigned to a processor in the machine. Algorithms are given that will be used to implement movement of data along the arcs of the graph. This architecture and algorithms define a system that is relatively simple to build and can do graph processing. All arcs can be transversed in parallel in time O(T), where T is empirically proportional to the diameter of the interconnection network times the average degree of the graph. Modifying or adding a new arc takes the same time as parallel traversal.
Computational Performance of a Parallelized Three-Dimensional High-Order Spectral Element Toolbox
NASA Astrophysics Data System (ADS)
Bosshard, Christoph; Bouffanais, Roland; Clémençon, Christian; Deville, Michel O.; Fiétier, Nicolas; Gruber, Ralf; Kehtari, Sohrab; Keller, Vincent; Latt, Jonas
In this paper, a comprehensive performance review of an MPI-based high-order three-dimensional spectral element method C++ toolbox is presented. The focus is put on the performance evaluation of several aspects with a particular emphasis on the parallel efficiency. The performance evaluation is analyzed with help of a time prediction model based on a parameterization of the application and the hardware resources. A tailor-made CFD computation benchmark case is introduced and used to carry out this review, stressing the particular interest for clusters with up to 8192 cores. Some problems in the parallel implementation have been detected and corrected. The theoretical complexities with respect to the number of elements, to the polynomial degree, and to communication needs are correctly reproduced. It is concluded that this type of code has a nearly perfect speed up on machines with thousands of cores, and is ready to make the step to next-generation petaflop machines.
Experimental Realization of a Quantum Support Vector Machine
NASA Astrophysics Data System (ADS)
Li, Zhaokai; Liu, Xiaomei; Xu, Nanyang; Du, Jiangfeng
2015-04-01
The fundamental principle of artificial intelligence is the ability of machines to learn from previous experience and do future work accordingly. In the age of big data, classical learning machines often require huge computational resources in many practical cases. Quantum machine learning algorithms, on the other hand, could be exponentially faster than their classical counterparts by utilizing quantum parallelism. Here, we demonstrate a quantum machine learning algorithm to implement handwriting recognition on a four-qubit NMR test bench. The quantum machine learns standard character fonts and then recognizes handwritten characters from a set with two candidates. Because of the wide spread importance of artificial intelligence and its tremendous consumption of computational resources, quantum speedup would be extremely attractive against the challenges of big data.
LHC Status and Upgrade Challenges
NASA Astrophysics Data System (ADS)
Smith, Jeffrey
2009-11-01
The Large Hadron Collider has had a trying start-up and a challenging operational future lays ahead. Critical to the machine's performance is controlling a beam of particles whose stored energy is equivalent to 80 kg of TNT. Unavoidable beam losses result in energy deposition throughout the machine and without adequate protection this power would result in quenching of the superconducting magnets. A brief overview of the machine layout and principles of operation will be reviewed including a summary of the September 2008 accident. The current status of the LHC, startup schedule and upgrade options to achieve the target luminosity will be presented.
Efficiently modeling neural networks on massively parallel computers
NASA Technical Reports Server (NTRS)
Farber, Robert M.
1993-01-01
Neural networks are a very useful tool for analyzing and modeling complex real world systems. Applying neural network simulations to real world problems generally involves large amounts of data and massive amounts of computation. To efficiently handle the computational requirements of large problems, we have implemented at Los Alamos a highly efficient neural network compiler for serial computers, vector computers, vector parallel computers, and fine grain SIMD computers such as the CM-2 connection machine. This paper describes the mapping used by the compiler to implement feed-forward backpropagation neural networks for a SIMD (Single Instruction Multiple Data) architecture parallel computer. Thinking Machines Corporation has benchmarked our code at 1.3 billion interconnects per second (approximately 3 gigaflops) on a 64,000 processor CM-2 connection machine (Singer 1990). This mapping is applicable to other SIMD computers and can be implemented on MIMD computers such as the CM-5 connection machine. Our mapping has virtually no communications overhead with the exception of the communications required for a global summation across the processors (which has a sub-linear runtime growth on the order of O(log(number of processors)). We can efficiently model very large neural networks which have many neurons and interconnects and our mapping can extend to arbitrarily large networks (within memory limitations) by merging the memory space of separate processors with fast adjacent processor interprocessor communications. This paper will consider the simulation of only feed forward neural network although this method is extendable to recurrent networks.
NASA Technical Reports Server (NTRS)
Reif, John H.
1987-01-01
A parallel compression algorithm for the 16,384 processor MPP machine was developed. The serial version of the algorithm can be viewed as a combination of on-line dynamic lossless test compression techniques (which employ simple learning strategies) and vector quantization. These concepts are described. How these concepts are combined to form a new strategy for performing dynamic on-line lossy compression is discussed. Finally, the implementation of this algorithm in a massively parallel fashion on the MPP is discussed.
PUP: An Architecture to Exploit Parallel Unification in Prolog
1988-03-01
environment stacking mo del similar to the Warren Abstract Machine [23] since it has been shown to be super ior to other known models (see [21]). The storage...execute in groups of independent operations. Unifications belonging to different group s may not overlap. Also unification operations belonging to the...since all parallel operations on the unification units must complete before any of the units can star t executing the next group of parallel
Fast adaptive composite grid methods on distributed parallel architectures
NASA Technical Reports Server (NTRS)
Lemke, Max; Quinlan, Daniel
1992-01-01
The fast adaptive composite (FAC) grid method is compared with the adaptive composite method (AFAC) under variety of conditions including vectorization and parallelization. Results are given for distributed memory multiprocessor architectures (SUPRENUM, Intel iPSC/2 and iPSC/860). It is shown that the good performance of AFAC and its superiority over FAC in a parallel environment is a property of the algorithm and not dependent on peculiarities of any machine.
An obstacle to building a time machine
NASA Astrophysics Data System (ADS)
Carroll, Sean M.; Farhi, Edward; Guth, Alan H.
1992-01-01
Gott (1991) has shown that a spacetime with two infinite parallel cosmic strings passing each other with sufficient velocity contains closed timelike curves. An attempt to build such a time machine is discussed. Using the energy-momentum conservation laws in the equivalent (2 + 1)-dimensional theory, the spacetime representing the decay of one gravitating particle into two is explicitly constructed; there is never enough mass in an open universe to build the time machine from the products of decays of stationary particles. More generally, the Gott time machine cannot exist in any open (2 + 1)-dimensional universe for which the total momentum is timelike.
RAMA: A file system for massively parallel computers
NASA Technical Reports Server (NTRS)
Miller, Ethan L.; Katz, Randy H.
1993-01-01
This paper describes a file system design for massively parallel computers which makes very efficient use of a few disks per processor. This overcomes the traditional I/O bottleneck of massively parallel machines by storing the data on disks within the high-speed interconnection network. In addition, the file system, called RAMA, requires little inter-node synchronization, removing another common bottleneck in parallel processor file systems. Support for a large tertiary storage system can easily be integrated in lo the file system; in fact, RAMA runs most efficiently when tertiary storage is used.
Code Optimization and Parallelization on the Origins: Looking from Users' Perspective
NASA Technical Reports Server (NTRS)
Chang, Yan-Tyng Sherry; Thigpen, William W. (Technical Monitor)
2002-01-01
Parallel machines are becoming the main compute engines for high performance computing. Despite their increasing popularity, it is still a challenge for most users to learn the basic techniques to optimize/parallelize their codes on such platforms. In this paper, we present some experiences on learning these techniques for the Origin systems at the NASA Advanced Supercomputing Division. Emphasis of this paper will be on a few essential issues (with examples) that general users should master when they work with the Origins as well as other parallel systems.
Sensibility study in a flexible job shop scheduling problem
NASA Astrophysics Data System (ADS)
Curralo, Ana; Pereira, Ana I.; Barbosa, José; Leitão, Paulo
2013-10-01
This paper proposes the impact assessment of the jobs order in the optimal time of operations in a Flexible Job Shop Scheduling Problem. In this work a real assembly cell was studied: the AIP-PRIMECA cell at the Université de Valenciennes et du Hainaut-Cambrésis, in France, which is considered as a Flexible Job Shop problem. The problem consists in finding the machines operations schedule, taking into account the precedence constraints. The main objective is to minimize the batch makespan, i.e. the finish time of the last operation completed in the schedule. Shortly, the present study consists in evaluating if the jobs order affects the optimal time of the operations schedule. The genetic algorithm was used to solve the optimization problem. As a conclusion, it's assessed that the jobs order influence the optimal time.
NASA Technical Reports Server (NTRS)
Chew, W. C.; Song, J. M.; Lu, C. C.; Weedon, W. H.
1995-01-01
In the first phase of our work, we have concentrated on laying the foundation to develop fast algorithms, including the use of recursive structure like the recursive aggregate interaction matrix algorithm (RAIMA), the nested equivalence principle algorithm (NEPAL), the ray-propagation fast multipole algorithm (RPFMA), and the multi-level fast multipole algorithm (MLFMA). We have also investigated the use of curvilinear patches to build a basic method of moments code where these acceleration techniques can be used later. In the second phase, which is mainly reported on here, we have concentrated on implementing three-dimensional NEPAL on a massively parallel machine, the Connection Machine CM-5, and have been able to obtain some 3D scattering results. In order to understand the parallelization of codes on the Connection Machine, we have also studied the parallelization of 3D finite-difference time-domain (FDTD) code with PML material absorbing boundary condition (ABC). We found that simple algorithms like the FDTD with material ABC can be parallelized very well allowing us to solve within a minute a problem of over a million nodes. In addition, we have studied the use of the fast multipole method and the ray-propagation fast multipole algorithm to expedite matrix-vector multiplication in a conjugate-gradient solution to integral equations of scattering. We find that these methods are faster than LU decomposition for one incident angle, but are slower than LU decomposition when many incident angles are needed as in the monostatic RCS calculations.
Highly fault-tolerant parallel computation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Spielman, D.A.
We re-introduce the coded model of fault-tolerant computation in which the input and output of a computational device are treated as words in an error-correcting code. A computational device correctly computes a function in the coded model if its input and output, once decoded, are a valid input and output of the function. In the coded model, it is reasonable to hope to simulate all computational devices by devices whose size is greater by a constant factor but which are exponentially reliable even if each of their components can fail with some constant probability. We consider fine-grained parallel computations inmore » which each processor has a constant probability of producing the wrong output at each time step. We show that any parallel computation that runs for time t on w processors can be performed reliably on a faulty machine in the coded model using w log{sup O(l)} w processors and time t log{sup O(l)} w. The failure probability of the computation will be at most t {center_dot} exp(-w{sup 1/4}). The codes used to communicate with our fault-tolerant machines are generalized Reed-Solomon codes and can thus be encoded and decoded in O(n log{sup O(1)} n) sequential time and are independent of the machine they are used to communicate with. We also show how coded computation can be used to self-correct many linear functions in parallel with arbitrarily small overhead.« less
Parallel spatial direct numerical simulations on the Intel iPSC/860 hypercube
NASA Technical Reports Server (NTRS)
Joslin, Ronald D.; Zubair, Mohammad
1993-01-01
The implementation and performance of a parallel spatial direct numerical simulation (PSDNS) approach on the Intel iPSC/860 hypercube is documented. The direct numerical simulation approach is used to compute spatially evolving disturbances associated with the laminar-to-turbulent transition in boundary-layer flows. The feasibility of using the PSDNS on the hypercube to perform transition studies is examined. The results indicate that the direct numerical simulation approach can effectively be parallelized on a distributed-memory parallel machine. By increasing the number of processors nearly ideal linear speedups are achieved with nonoptimized routines; slower than linear speedups are achieved with optimized (machine dependent library) routines. This slower than linear speedup results because the Fast Fourier Transform (FFT) routine dominates the computational cost and because the routine indicates less than ideal speedups. However with the machine-dependent routines the total computational cost decreases by a factor of 4 to 5 compared with standard FORTRAN routines. The computational cost increases linearly with spanwise wall-normal and streamwise grid refinements. The hypercube with 32 processors was estimated to require approximately twice the amount of Cray supercomputer single processor time to complete a comparable simulation; however it is estimated that a subgrid-scale model which reduces the required number of grid points and becomes a large-eddy simulation (PSLES) would reduce the computational cost and memory requirements by a factor of 10 over the PSDNS. This PSLES implementation would enable transition simulations on the hypercube at a reasonable computational cost.
NASA Astrophysics Data System (ADS)
Chang, Faliang; Liu, Chunsheng
2017-09-01
The high variability of sign colors and shapes in uncontrolled environments has made the detection of traffic signs a challenging problem in computer vision. We propose a traffic sign detection (TSD) method based on coarse-to-fine cascade and parallel support vector machine (SVM) detectors to detect Chinese warning and danger traffic signs. First, a region of interest (ROI) extraction method is proposed to extract ROIs using color contrast features in local regions. The ROI extraction can reduce scanning regions and save detection time. For multiclass TSD, we propose a structure that combines a coarse-to-fine cascaded tree with a parallel structure of histogram of oriented gradients (HOG) + SVM detectors. The cascaded tree is designed to detect different types of traffic signs in a coarse-to-fine process. The parallel HOG + SVM detectors are designed to do fine detection of different types of traffic signs. The experiments demonstrate the proposed TSD method can rapidly detect multiclass traffic signs with different colors and shapes in high accuracy.
Parallel integer sorting with medium and fine-scale parallelism
NASA Technical Reports Server (NTRS)
Dagum, Leonardo
1993-01-01
Two new parallel integer sorting algorithms, queue-sort and barrel-sort, are presented and analyzed in detail. These algorithms do not have optimal parallel complexity, yet they show very good performance in practice. Queue-sort designed for fine-scale parallel architectures which allow the queueing of multiple messages to the same destination. Barrel-sort is designed for medium-scale parallel architectures with a high message passing overhead. The performance results from the implementation of queue-sort on a Connection Machine CM-2 and barrel-sort on a 128 processor iPSC/860 are given. The two implementations are found to be comparable in performance but not as good as a fully vectorized bucket sort on the Cray YMP.
A microeconomic scheduler for parallel computers
NASA Technical Reports Server (NTRS)
Stoica, Ion; Abdel-Wahab, Hussein; Pothen, Alex
1995-01-01
We describe a scheduler based on the microeconomic paradigm for scheduling on-line a set of parallel jobs in a multiprocessor system. In addition to the classical objectives of increasing the system throughput and reducing the response time, we consider fairness in allocating system resources among the users, and providing the user with control over the relative performances of his jobs. We associate with every user a savings account in which he receives money at a constant rate. When a user wants to run a job, he creates an expense account for that job to which he transfers money from his savings account. The job uses the funds in its expense account to obtain the system resources it needs for execution. The share of the system resources allocated to the user is directly related to the rate at which the user receives money; the rate at which the user transfers money into a job expense account controls the job's performance. We prove that starvation is not possible in our model. Simulation results show that our scheduler improves both system and user performances in comparison with two different variable partitioning policies. It is also shown to be effective in guaranteeing fairness and providing control over the performance of jobs.
A Parallel Vector Machine for the PM Programming Language
NASA Astrophysics Data System (ADS)
Bellerby, Tim
2016-04-01
PM is a new programming language which aims to make the writing of computational geoscience models on parallel hardware accessible to scientists who are not themselves expert parallel programmers. It is based around the concept of communicating operators: language constructs that enable variables local to a single invocation of a parallelised loop to be viewed as if they were arrays spanning the entire loop domain. This mechanism enables different loop invocations (which may or may not be executing on different processors) to exchange information in a manner that extends the successful Communicating Sequential Processes idiom from single messages to collective communication. Communicating operators avoid the additional synchronisation mechanisms, such as atomic variables, required when programming using the Partitioned Global Address Space (PGAS) paradigm. Using a single loop invocation as the fundamental unit of concurrency enables PM to uniformly represent different levels of parallelism from vector operations through shared memory systems to distributed grids. This paper describes an implementation of PM based on a vectorised virtual machine. On a single processor node, concurrent operations are implemented using masked vector operations. Virtual machine instructions operate on vectors of values and may be unmasked, masked using a Boolean field, or masked using an array of active vector cell locations. Conditional structures (such as if-then-else or while statement implementations) calculate and apply masks to the operations they control. A shift in mask representation from Boolean to location-list occurs when active locations become sufficiently sparse. Parallel loops unfold data structures (or vectors of data structures for nested loops) into vectors of values that may additionally be distributed over multiple computational nodes and then split into micro-threads compatible with the size of the local cache. Inter-node communication is accomplished using standard OpenMP and MPI. Performance analyses of the PM vector machine, demonstrating its scaling properties with respect to domain size and the number of processor nodes will be presented for a range of hardware configurations. The PM software and language definition are being made available under unrestrictive MIT and Creative Commons Attribution licenses respectively: www.pm-lang.org.
A distributed version of the NASA Engine Performance Program
NASA Technical Reports Server (NTRS)
Cours, Jeffrey T.; Curlett, Brian P.
1993-01-01
Distributed NEPP, a version of the NASA Engine Performance Program, uses the original NEPP code but executes it in a distributed computer environment. Multiple workstations connected by a network increase the program's speed and, more importantly, the complexity of the cases it can handle in a reasonable time. Distributed NEPP uses the public domain software package, called Parallel Virtual Machine, allowing it to execute on clusters of machines containing many different architectures. It includes the capability to link with other computers, allowing them to process NEPP jobs in parallel. This paper discusses the design issues and granularity considerations that entered into programming Distributed NEPP and presents the results of timing runs.
NASA Astrophysics Data System (ADS)
Battaïa, Olga; Dolgui, Alexandre; Guschinsky, Nikolai; Levin, Genrikh
2014-10-01
Solving equipment selection and line balancing problems together allows better line configurations to be reached and avoids local optimal solutions. This article considers jointly these two decision problems for mass production lines with serial-parallel workplaces. This study was motivated by the design of production lines based on machines with rotary or mobile tables. Nevertheless, the results are more general and can be applied to assembly and production lines with similar structures. The designers' objectives and the constraints are studied in order to suggest a relevant mathematical model and an efficient optimization approach to solve it. A real case study is used to validate the model and the developed approach.
A meta-heuristic method for solving scheduling problem: crow search algorithm
NASA Astrophysics Data System (ADS)
Adhi, Antono; Santosa, Budi; Siswanto, Nurhadi
2018-04-01
Scheduling is one of the most important processes in an industry both in manufacturingand services. The scheduling process is the process of selecting resources to perform an operation on tasks. Resources can be machines, peoples, tasks, jobs or operations.. The selection of optimum sequence of jobs from a permutation is an essential issue in every research in scheduling problem. Optimum sequence becomes optimum solution to resolve scheduling problem. Scheduling problem becomes NP-hard problem since the number of job in the sequence is more than normal number can be processed by exact algorithm. In order to obtain optimum results, it needs a method with capability to solve complex scheduling problems in an acceptable time. Meta-heuristic is a method usually used to solve scheduling problem. The recently published method called Crow Search Algorithm (CSA) is adopted in this research to solve scheduling problem. CSA is an evolutionary meta-heuristic method which is based on the behavior in flocks of crow. The calculation result of CSA for solving scheduling problem is compared with other algorithms. From the comparison, it is found that CSA has better performance in term of optimum solution and time calculation than other algorithms.
Spike: Artificial intelligence scheduling for Hubble space telescope
NASA Technical Reports Server (NTRS)
Johnston, Mark; Miller, Glenn; Sponsler, Jeff; Vick, Shon; Jackson, Robert
1990-01-01
Efficient utilization of spacecraft resources is essential, but the accompanying scheduling problems are often computationally intractable and are difficult to approximate because of the presence of numerous interacting constraints. Artificial intelligence techniques were applied to the scheduling of the NASA/ESA Hubble Space Telescope (HST). This presents a particularly challenging problem since a yearlong observing program can contain some tens of thousands of exposures which are subject to a large number of scientific, operational, spacecraft, and environmental constraints. New techniques were developed for machine reasoning about scheduling constraints and goals, especially in cases where uncertainty is an important scheduling consideration and where resolving conflicts among conflicting preferences is essential. These technique were utilized in a set of workstation based scheduling tools (Spike) for HST. Graphical displays of activities, constraints, and schedules are an important feature of the system. High level scheduling strategies using both rule based and neural network approaches were developed. While the specific constraints implemented are those most relevant to HST, the framework developed is far more general and could easily handle other kinds of scheduling problems. The concept and implementation of the Spike system are described along with some experiments in adapting Spike to other spacecraft scheduling domains.
Optimisation of a parallel ocean general circulation model
NASA Astrophysics Data System (ADS)
Beare, M. I.; Stevens, D. P.
1997-10-01
This paper presents the development of a general-purpose parallel ocean circulation model, for use on a wide range of computer platforms, from traditional scalar machines to workstation clusters and massively parallel processors. Parallelism is provided, as a modular option, via high-level message-passing routines, thus hiding the technical intricacies from the user. An initial implementation highlights that the parallel efficiency of the model is adversely affected by a number of factors, for which optimisations are discussed and implemented. The resulting ocean code is portable and, in particular, allows science to be achieved on local workstations that could otherwise only be undertaken on state-of-the-art supercomputers.
Knowledge representation into Ada parallel processing
NASA Technical Reports Server (NTRS)
Masotto, Tom; Babikyan, Carol; Harper, Richard
1990-01-01
The Knowledge Representation into Ada Parallel Processing project is a joint NASA and Air Force funded project to demonstrate the execution of intelligent systems in Ada on the Charles Stark Draper Laboratory fault-tolerant parallel processor (FTPP). Two applications were demonstrated - a portion of the adaptive tactical navigator and a real time controller. Both systems are implemented as Activation Framework Objects on the Activation Framework intelligent scheduling mechanism developed by Worcester Polytechnic Institute. The implementations, results of performance analyses showing speedup due to parallelism and initial efficiency improvements are detailed and further areas for performance improvements are suggested.
Method for resource control in parallel environments using program organization and run-time support
NASA Technical Reports Server (NTRS)
Ekanadham, Kattamuri (Inventor); Moreira, Jose Eduardo (Inventor); Naik, Vijay Krishnarao (Inventor)
2001-01-01
A system and method for dynamic scheduling and allocation of resources to parallel applications during the course of their execution. By establishing well-defined interactions between an executing job and the parallel system, the system and method support dynamic reconfiguration of processor partitions, dynamic distribution and redistribution of data, communication among cooperating applications, and various other monitoring actions. The interactions occur only at specific points in the execution of the program where the aforementioned operations can be performed efficiently.
Method for resource control in parallel environments using program organization and run-time support
NASA Technical Reports Server (NTRS)
Ekanadham, Kattamuri (Inventor); Moreira, Jose Eduardo (Inventor); Naik, Vijay Krishnarao (Inventor)
1999-01-01
A system and method for dynamic scheduling and allocation of resources to parallel applications during the course of their execution. By establishing well-defined interactions between an executing job and the parallel system, the system and method support dynamic reconfiguration of processor partitions, dynamic distribution and redistribution of data, communication among cooperating applications, and various other monitoring actions. The interactions occur only at specific points in the execution of the program where the aforementioned operations can be performed efficiently.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Miller, Allan Ray
1987-05-01
Increases in high speed hardware have mandated studies in software techniques to exploit the parallel capabilities. This thesis examines the effects a run-time scheduler has on a multiprocessor. The model consists of directed, acyclic graphs, generated from serial FORTRAN benchmark programs by the parallel compiler Parafrase. A multitasked, multiprogrammed environment is created. Dependencies are generated by the compiler. Tasks are bidimensional, i.e., they may specify both time and processor requests. Processor requests may be folded into execution time by the scheduler. The graphs may arrive at arbitrary time intervals. The general case is NP-hard, thus, a variety of heuristics aremore » examined by a simulator. Multiprogramming demonstrates a greater need for a run-time scheduler than does monoprogramming for a variety of reasons, e.g., greater stress on the processors, a larger number of independent control paths, more variety in the task parameters, etc. The dynamic critical path series of algorithms perform well. Dynamic critical volume did not add much. Unfortunately, dynamic critical path maximizes turnaround time as well as throughput. Two schedulers are presented which balance throughput and turnaround time. The first requires classification of jobs by type; the second requires selection of a ratio value which is dependent upon system parameters. 45 refs., 19 figs., 20 tabs.« less
Ebrahimi, Ahmad; Kia, Reza; Komijan, Alireza Rashidi
2016-01-01
In this article, a novel integrated mixed-integer nonlinear programming model is presented for designing a cellular manufacturing system (CMS) considering machine layout and part scheduling problems simultaneously as interrelated decisions. The integrated CMS model is formulated to incorporate several design features including part due date, material handling time, operation sequence, processing time, an intra-cell layout of unequal-area facilities, and part scheduling. The objective function is to minimize makespan, tardiness penalties, and material handling costs of inter-cell and intra-cell movements. Two numerical examples are solved by the Lingo software to illustrate the results obtained by the incorporated features. In order to assess the effects and importance of integration of machine layout and part scheduling in designing a CMS, two approaches, sequentially and concurrent are investigated and the improvement resulted from a concurrent approach is revealed. Also, due to the NP-hardness of the integrated model, an efficient genetic algorithm is designed. As a consequence, computational results of this study indicate that the best solutions found by GA are better than the solutions found by B&B in much less time for both sequential and concurrent approaches. Moreover, the comparisons between the objective function values (OFVs) obtained by sequential and concurrent approaches demonstrate that the OFV improvement is averagely around 17 % by GA and 14 % by B&B.
Archer, Charles J; Blocksome, Michael A; Peters, Amanda E; Ratterman, Joseph D; Smith, Brian E
2012-10-16
Methods, apparatus, and products are disclosed for scheduling applications for execution on a plurality of compute nodes of a parallel computer to manage temperature of the plurality of compute nodes during execution that include: identifying one or more applications for execution on the plurality of compute nodes; creating a plurality of physically discontiguous node partitions in dependence upon temperature characteristics for the compute nodes and a physical topology for the compute nodes, each discontiguous node partition specifying a collection of physically adjacent compute nodes; and assigning, for each application, that application to one or more of the discontiguous node partitions for execution on the compute nodes specified by the assigned discontiguous node partitions.
Improved and Cost Effective Machining Techniques for Tracked Combat Vehicle Parts
1983-10-01
steel is shown in Figure 7-7 and consists of tempered marten- site. Three of the alloys which are used in the gas turbine engine are cast 17 - 4PH ...stainless steel, Inconel 718 and Inconel 713. The 17 - 4PH stainless steel was machined in the solution treated and aged condition. The microstructure as shown...SECURITY CLASS. (of thia report) ISa. DECLASSIFICATION/DOWNGRADING SCHEDULE 16. DISTRIBUTION STATEMENT (of thie Report) 17 . DISTRIBUTION STATEMENT (of
Scheduling a Medium-Sized Manufacturing Shop: A Simulation Study
1993-09-01
distinction, elements of work order data include: the minimum machine type required for a work order, as well as the prgramming , set-up, and machining... prevent this from happening. Such a mechanism could take the form of a reprioritization function that is executed after a specified period of time...system for a very long time unless some mechanism is used to prevent this from happening. The jobs left in the system will be the ones that have very
NASA Astrophysics Data System (ADS)
Zhang, Xingong; Yin, Yunqiang; Wu, Chin-Chia
2017-01-01
There is a situation found in many manufacturing systems, such as steel rolling mills, fire fighting or single-server cycle-queues, where a job that is processed later consumes more time than that same job when processed earlier. The research finds that machine maintenance can improve the worsening of processing conditions. After maintenance activity, the machine will be restored. The maintenance duration is a positive and non-decreasing differentiable convex function of the total processing times of the jobs between maintenance activities. Motivated by this observation, the makespan and the total completion time minimization problems in the scheduling of jobs with non-decreasing rates of job processing time on a single machine are considered in this article. It is shown that both the makespan and the total completion time minimization problems are NP-hard in the strong sense when the number of maintenance activities is arbitrary, while the makespan minimization problem is NP-hard in the ordinary sense when the number of maintenance activities is fixed. If the deterioration rates of the jobs are identical and the maintenance duration is a linear function of the total processing times of the jobs between maintenance activities, then this article shows that the group balance principle is satisfied for the makespan minimization problem. Furthermore, two polynomial-time algorithms are presented for solving the makespan problem and the total completion time problem under identical deterioration rates, respectively.
Resource Management in Constrained Dynamic Situations
NASA Astrophysics Data System (ADS)
Seok, Jinwoo
Resource management is considered in this dissertation for systems with limited resources, possibly combined with other system constraints, in unpredictably dynamic environments. Resources may represent fuel, power, capabilities, energy, and so on. Resource management is important for many practical systems; usually, resources are limited, and their use must be optimized. Furthermore, systems are often constrained, and constraints must be satisfied for safe operation. Simplistic resource management can result in poor use of resources and failure of the system. Furthermore, many real-world situations involve dynamic environments. Many traditional problems are formulated based on the assumptions of given probabilities or perfect knowledge of future events. However, in many cases, the future is completely unknown, and information on or probabilities about future events are not available. In other words, we operate in unpredictably dynamic situations. Thus, a method is needed to handle dynamic situations without knowledge of the future, but few formal methods have been developed to address them. Thus, the goal is to design resource management methods for constrained systems, with limited resources, in unpredictably dynamic environments. To this end, resource management is organized hierarchically into two levels: 1) planning, and 2) control. In the planning level, the set of tasks to be performed is scheduled based on limited resources to maximize resource usage in unpredictably dynamic environments. In the control level, the system controller is designed to follow the schedule by considering all the system constraints for safe and efficient operation. Consequently, this dissertation is mainly divided into two parts: 1) planning level design, based on finite state machines, and 2) control level methods, based on model predictive control. We define a recomposable restricted finite state machine to handle limited resource situations and unpredictably dynamic environments for the planning level. To obtain a policy, dynamic programing is applied, and to obtain a solution, limited breadth-first search is applied to the recomposable restricted finite state machine. A multi-function phased array radar resource management problem and an unmanned aerial vehicle patrolling problem are treated using recomposable restricted finite state machines. Then, we use model predictive control for the control level, because it allows constraint handling and setpoint tracking for the schedule. An aircraft power system management problem is treated that aims to develop an integrated control system for an aircraft gas turbine engine and electrical power system using rate-based model predictive control. Our results indicate that at the planning level, limited breadth-first search for recomposable restricted finite state machines generates good scheduling solutions in limited resource situations and unpredictably dynamic environments. The importance of cooperation in the planning level is also verified. At the control level, a rate-based model predictive controller allows good schedule tracking and safe operations. The importance of considering the system constraints and interactions between the subsystems is indicated. For the best resource management in constrained dynamic situations, the planning level and the control level need to be considered together.
NASA Astrophysics Data System (ADS)
Budi Harja, Herman; Prakosa, Tri; Raharno, Sri; Yuwana Martawirya, Yatna; Nurhadi, Indra; Setyo Nogroho, Alamsyah
2018-03-01
The production characteristic of job-shop industry at which products have wide variety but small amounts causes every machine tool will be shared to conduct production process with dynamic load. Its dynamic condition operation directly affects machine tools component reliability. Hence, determination of maintenance schedule for every component should be calculated based on actual usage of machine tools component. This paper describes study on development of monitoring system to obtaining information about each CNC machine tool component usage in real time approached by component grouping based on its operation phase. A special device has been developed for monitoring machine tool component usage by utilizing usage phase activity data taken from certain electronics components within CNC machine. The components are adaptor, servo driver and spindle driver, as well as some additional components such as microcontroller and relays. The obtained data are utilized for detecting machine utilization phases such as power on state, machine ready state or spindle running state. Experimental result have shown that the developed CNC machine tool monitoring system is capable of obtaining phase information of machine tool usage as well as its duration and displays the information at the user interface application.
Performance analysis of a large-grain dataflow scheduling paradigm
NASA Technical Reports Server (NTRS)
Young, Steven D.; Wills, Robert W.
1993-01-01
A paradigm for scheduling computations on a network of multiprocessors using large-grain data flow scheduling at run time is described and analyzed. The computations to be scheduled must follow a static flow graph, while the schedule itself will be dynamic (i.e., determined at run time). Many applications characterized by static flow exist, and they include real-time control and digital signal processing. With the advent of computer-aided software engineering (CASE) tools for capturing software designs in dataflow-like structures, macro-dataflow scheduling becomes increasingly attractive, if not necessary. For parallel implementations, using the macro-dataflow method allows the scheduling to be insulated from the application designer and enables the maximum utilization of available resources. Further, by allowing multitasking, processor utilizations can approach 100 percent while they maintain maximum speedup. Extensive simulation studies are performed on 4-, 8-, and 16-processor architectures that reflect the effects of communication delays, scheduling delays, algorithm class, and multitasking on performance and speedup gains.
Cario, Clinton L; Witte, John S
2018-03-15
As whole-genome tumor sequence and biological annotation datasets grow in size, number and content, there is an increasing basic science and clinical need for efficient and accurate data management and analysis software. With the emergence of increasingly sophisticated data stores, execution environments and machine learning algorithms, there is also a need for the integration of functionality across frameworks. We present orchid, a python based software package for the management, annotation and machine learning of cancer mutations. Building on technologies of parallel workflow execution, in-memory database storage and machine learning analytics, orchid efficiently handles millions of mutations and hundreds of features in an easy-to-use manner. We describe the implementation of orchid and demonstrate its ability to distinguish tissue of origin in 12 tumor types based on 339 features using a random forest classifier. Orchid and our annotated tumor mutation database are freely available at https://github.com/wittelab/orchid. Software is implemented in python 2.7, and makes use of MySQL or MemSQL databases. Groovy 2.4.5 is optionally required for parallel workflow execution. JWitte@ucsf.edu. Supplementary data are available at Bioinformatics online.
Multiresource allocation and scheduling for periodic soft real-time applications
NASA Astrophysics Data System (ADS)
Gopalan, Kartik; Chiueh, Tzi-cker
2001-12-01
Real-time applications that utilize multiple system resources, such as CPU, disks, and network links, require coordinated scheduling of these resources in order to meet their end-to-end performance requirements. Most state-of-the-art operating systems support independent resource allocation and deadline-driven scheduling but lack coordination among multiple heterogeneous resources. This paper describes the design and implementation of an Integrated Real-time Resource Scheduler (IRS) that performs coordinated allocation and scheduling of multiple heterogeneous resources on the same machine for periodic soft real-time application. The principal feature of IRS is a heuristic multi-resource allocation algorithm that reserves multiple resources for real-time applications in a manner that can maximize the number of applications admitted into the system in the long run. At run-time, a global scheduler dispatches the tasks of the soft real-time application to individual resource schedulers according to the precedence constraints between tasks. The individual resource schedulers, which could be any deadline based schedulers, can make scheduling decisions locally and yet collectively satisfy a real-time application's performance requirements. The tightness of overall timing guarantees is ultimately determined by the properties of individual resource schedulers. However, IRS maximizes overall system resource utilization efficiency by coordinating deadline assignment across multiple tasks in a soft real-time application.
Code of Federal Regulations, 2010 CFR
2010-07-01
... noted after the country or area. Schedule (1) North Korea, i.e., Korea north of the 38th parallel of north latitude: December 17, 1950. (2) Cambodia: April 17, 1975. (3) North Vietnam; i.e., Vietnam north of the 17th parallel of north latitude: May 5, 1964. (4) South Vietnam, i.e., Vietnam south of the...
Implementations of BLAST for parallel computers.
Jülich, A
1995-02-01
The BLAST sequence comparison programs have been ported to a variety of parallel computers-the shared memory machine Cray Y-MP 8/864 and the distributed memory architectures Intel iPSC/860 and nCUBE. Additionally, the programs were ported to run on workstation clusters. We explain the parallelization techniques and consider the pros and cons of these methods. The BLAST programs are very well suited for parallelization for a moderate number of processors. We illustrate our results using the program blastp as an example. As input data for blastp, a 799 residue protein query sequence and the protein database PIR were used.
Tile-based Level of Detail for the Parallel Age
DOE Office of Scientific and Technical Information (OSTI.GOV)
Niski, K; Cohen, J D
Today's PCs incorporate multiple CPUs and GPUs and are easily arranged in clusters for high-performance, interactive graphics. We present an approach based on hierarchical, screen-space tiles to parallelizing rendering with level of detail. Adapt tiles, render tiles, and machine tiles are associated with CPUs, GPUs, and PCs, respectively, to efficiently parallelize the workload with good resource utilization. Adaptive tile sizes provide load balancing while our level of detail system allows total and independent management of the load on CPUs and GPUs. We demonstrate our approach on parallel configurations consisting of both single PCs and a cluster of PCs.
Open shop scheduling problem to minimize total weighted completion time
NASA Astrophysics Data System (ADS)
Bai, Danyu; Zhang, Zhihai; Zhang, Qiang; Tang, Mengqian
2017-01-01
A given number of jobs in an open shop scheduling environment must each be processed for given amounts of time on each of a given set of machines in an arbitrary sequence. This study aims to achieve a schedule that minimizes total weighted completion time. Owing to the strong NP-hardness of the problem, the weighted shortest processing time block (WSPTB) heuristic is presented to obtain approximate solutions for large-scale problems. Performance analysis proves the asymptotic optimality of the WSPTB heuristic in the sense of probability limits. The largest weight block rule is provided to seek optimal schedules in polynomial time for a special case. A hybrid discrete differential evolution algorithm is designed to obtain high-quality solutions for moderate-scale problems. Simulation experiments demonstrate the effectiveness of the proposed algorithms.
ERIC Educational Resources Information Center
Instructor, 1983
1983-01-01
Instructor's Computer-Using Teachers Board members give practical tips on how to get a classroom ready for a new computer, introduce students to the machine, and help them learn about programing and computer literacy. Safety, scheduling, and supervision requirements are noted. (PP)
Design consideration in constructing high performance embedded Knowledge-Based Systems (KBS)
NASA Technical Reports Server (NTRS)
Dalton, Shelly D.; Daley, Philip C.
1988-01-01
As the hardware trends for artificial intelligence (AI) involve more and more complexity, the process of optimizing the computer system design for a particular problem will also increase in complexity. Space applications of knowledge based systems (KBS) will often require an ability to perform both numerically intensive vector computations and real time symbolic computations. Although parallel machines can theoretically achieve the speeds necessary for most of these problems, if the application itself is not highly parallel, the machine's power cannot be utilized. A scheme is presented which will provide the computer systems engineer with a tool for analyzing machines with various configurations of array, symbolic, scaler, and multiprocessors. High speed networks and interconnections make customized, distributed, intelligent systems feasible for the application of AI in space. The method presented can be used to optimize such AI system configurations and to make comparisons between existing computer systems. It is an open question whether or not, for a given mission requirement, a suitable computer system design can be constructed for any amount of money.
Zhao, Chuan-Li; Hsu, Hua-Feng
2014-01-01
This paper considers single machine scheduling and due date assignment with setup time. The setup time is proportional to the length of the already processed jobs; that is, the setup time is past-sequence-dependent (p-s-d). It is assumed that a job's processing time depends on its position in a sequence. The objective functions include total earliness, the weighted number of tardy jobs, and the cost of due date assignment. We analyze these problems with two different due date assignment methods. We first consider the model with job-dependent position effects. For each case, by converting the problem to a series of assignment problems, we proved that the problems can be solved in O(n 4) time. For the model with job-independent position effects, we proved that the problems can be solved in O(n 3) time by providing a dynamic programming algorithm. PMID:25258727
Minimization of Delay Costs in the Realization of Production Orders in Two-Machine System
NASA Astrophysics Data System (ADS)
Dylewski, Robert; Jardzioch, Andrzej; Dworak, Oliver
2018-03-01
The article presents a new algorithm that enables the allocation of the optimal scheduling of the production orders in the two-machine system based on the minimum cost of order delays. The formulated algorithm uses the method of branch and bounds and it is a particular generalisation of the algorithm enabling for the determination of the sequence of the production orders with the minimal sum of the delays. In order to illustrate the proposed algorithm in the best way, the article contains examples accompanied by the graphical trees of solutions. The research analysing the utility of the said algorithm was conducted. The achieved results proved the usefulness of the proposed algorithm when applied to scheduling of orders. The formulated algorithm was implemented in the Matlab programme. In addition, the studies for different sets of production orders were conducted.
Zhao, Chuan-Li; Hsu, Chou-Jung; Hsu, Hua-Feng
2014-01-01
This paper considers single machine scheduling and due date assignment with setup time. The setup time is proportional to the length of the already processed jobs; that is, the setup time is past-sequence-dependent (p-s-d). It is assumed that a job's processing time depends on its position in a sequence. The objective functions include total earliness, the weighted number of tardy jobs, and the cost of due date assignment. We analyze these problems with two different due date assignment methods. We first consider the model with job-dependent position effects. For each case, by converting the problem to a series of assignment problems, we proved that the problems can be solved in O(n(4)) time. For the model with job-independent position effects, we proved that the problems can be solved in O(n(3)) time by providing a dynamic programming algorithm.
Parallel-vector out-of-core equation solver for computational mechanics
NASA Technical Reports Server (NTRS)
Qin, J.; Agarwal, T. K.; Storaasli, O. O.; Nguyen, D. T.; Baddourah, M. A.
1993-01-01
A parallel/vector out-of-core equation solver is developed for shared-memory computers, such as the Cray Y-MP machine. The input/ output (I/O) time is reduced by using the a synchronous BUFFER IN and BUFFER OUT, which can be executed simultaneously with the CPU instructions. The parallel and vector capability provided by the supercomputers is also exploited to enhance the performance. Numerical applications in large-scale structural analysis are given to demonstrate the efficiency of the present out-of-core solver.
The Complexity of Parallel Algorithms,
1985-11-01
programns have been written for se(luiential coiipn ters. Many p~eop~le want coimp ~ilers dihal. will c(nimpile t he, code for parallel machines, to avoid...between two vertices. We also rely on parallel algorithms for maintaining data structures and manipulating graphs. We do not go into the details of these...Jpatlis and maintain connected coimp ~onents. The routine is: - 35 .- ExtendPath(r, Q, V) begin P +-0; s 4- while there is a path in V - P from s to a vertex
Parallel Computing:. Some Activities in High Energy Physics
NASA Astrophysics Data System (ADS)
Willers, Ian
This paper examines some activities in High Energy Physics that utilise parallel computing. The topic includes all computing from the proposed SIMD front end detectors, the farming applications, high-powered RISC processors and the large machines in the computer centers. We start by looking at the motivation behind using parallelism for general purpose computing. The developments around farming are then described from its simplest form to the more complex system in Fermilab. Finally, there is a list of some developments that are happening close to the experiments.
Means and method of balancing multi-cylinder reciprocating machines
Corey, John A.; Walsh, Michael M.
1985-01-01
A virtual balancing axis arrangement is described for multi-cylinder reciprocating piston machines for effectively balancing out imbalanced forces and minimizing residual imbalance moments acting on the crankshaft of such machines without requiring the use of additional parallel-arrayed balancing shafts or complex and expensive gear arrangements. The novel virtual balancing axis arrangement is capable of being designed into multi-cylinder reciprocating piston and crankshaft machines for substantially reducing vibrations induced during operation of such machines with only minimal number of additional component parts. Some of the required component parts may be available from parts already required for operation of auxiliary equipment, such as oil and water pumps used in certain types of reciprocating piston and crankshaft machine so that by appropriate location and dimensioning in accordance with the teachings of the invention, the virtual balancing axis arrangement can be built into the machine at little or no additional cost.
Taxi Time Prediction at Charlotte Airport Using Fast-Time Simulation and Machine Learning Techniques
NASA Technical Reports Server (NTRS)
Lee, Hanbong
2016-01-01
Accurate taxi time prediction is required for enabling efficient runway scheduling that can increase runway throughput and reduce taxi times and fuel consumptions on the airport surface. Currently NASA and American Airlines are jointly developing a decision-support tool called Spot and Runway Departure Advisor (SARDA) that assists airport ramp controllers to make gate pushback decisions and improve the overall efficiency of airport surface traffic. In this presentation, we propose to use Linear Optimized Sequencing (LINOS), a discrete-event fast-time simulation tool, to predict taxi times and provide the estimates to the runway scheduler in real-time airport operations. To assess its prediction accuracy, we also introduce a data-driven analytical method using machine learning techniques. These two taxi time prediction methods are evaluated with actual taxi time data obtained from the SARDA human-in-the-loop (HITL) simulation for Charlotte Douglas International Airport (CLT) using various performance measurement metrics. Based on the taxi time prediction results, we also discuss how the prediction accuracy can be affected by the operational complexity at this airport and how we can improve the fast time simulation model before implementing it with an airport scheduling algorithm in a real-time environment.
Process Development and Micro-Machining of MARBLE Foam-Cored Rexolite Hemi-Shell Ablator Capsules
Randolph, Randall Blaine; Oertel, John A.; Schmidt, Derek William; ...
2016-06-30
For this study, machined CH hemi-shell ablator capsules have been successfully produced by the MST-7 Target Fabrication Team at Los Alamos National Laboratory. Process development and micro-machining techniques have been developed to produce capsules for both the Omega and National Ignition Facility (NIF) campaigns. These capsules are gas filled up to 10 atm and consist of a machined plastic hemi-shell outer layer that accommodates various specially engineered low-density polystyrene foam cores. Machining and assembly of the two-part, step-jointed plastic hemi-shell outer layer required development of new techniques, processes, and tooling while still meeting very aggressive shot schedules for both campaigns.more » Finally, problems encountered and process improvements will be discussed that describe this very unique, complex capsule design approach through the first Omega proof-of-concept version to the larger NIF version.« less
FALCON: A distributed scheduler for MIMD architectures
DOE Office of Scientific and Technical Information (OSTI.GOV)
Grimshaw, A.S.; Vivas, V.E. Jr.
1991-01-01
This paper describes FALCON (Fully Automatic Load COordinator for Networks), the scheduler for the Mentat parallel processing system. FALCON has a modular structure and is designed for systems that use a task scheduling mechanism. FALCON is distributed, stable, supports system heterogeneities, and employs a sender-initiated adaptive load sharing policy with static task assignment. FALCON is parameterizable and is implemented in Mentat, a working distributed system. We present the design and implementation of FALCON as well as a brief introduction to those features of the Mentat run-time system that influence FALCON. Performance measures under different scheduler configurations are also presented andmore » analyzed with respect to the system parameters. 36 refs., 8 figs.« less
A multi-group and preemptable scheduling of cloud resource based on HTCondor
NASA Astrophysics Data System (ADS)
Jiang, Xiaowei; Zou, Jiaheng; Cheng, Yaodong; Shi, Jingyan
2017-10-01
Due to the features of virtual machine-flexibility, easy controlling and various system environments, more and more fields utilize the virtualization technology to construct the distributed system with the virtual resources, also including high energy physics. This paper introduce a method used in high energy physics that supports multiple resource group and preemptable cloud resource scheduling, combining virtual machine with HTCondor (a batch system). It makes resource controlling more flexible and more efficient and makes resource scheduling independent of job scheduling. Firstly, the resources belong to different experiment-groups, and the type of user-groups mapping to resource-groups(same as experiment-group) is one-to-one or many-to-one. In order to make the confused group simply to be managed, we designed the permission controlling component to ensure that the different resource-groups can get the suitable jobs. Secondly, for the purpose of elastically allocating resources for suitable resource-group, it is necessary to schedule resources like scheduling jobs. So this paper designs the cloud resource scheduling to maintain a resource queue and allocate an appropriate amount of virtual resources to the request resource-group. Thirdly, in some kind of situations, because of the resource occupied for a long time, resources need to be preempted. This paper adds the preemption function for the resource scheduling that implement resource preemption based on the group priority. Additionally, the way to preempting is soft that when virtual resources are preempted, jobs will not be killed but also be held and rematched later. It is implemented with the help of HTCondor, storing the held job information in scheduler, releasing the job to idle status and doing second matcher. In IHEP (institute of high energy physics), we have built a batch system based on HTCondor with a virtual resources pool based on Openstack. And this paper will show some cases of experiment JUNO and LHAASO. The result indicates that multi-group and preemptable resource scheduling is efficient to support multi-group and soft preemption. Additionally, the permission controlling component has been used in the local computing cluster, supporting for experiment JUNO, CMS and LHAASO, and the scale will be expanded to more experiments at the first half year, including DYW, BES and so on. Its evidence that the permission controlling is efficient.
NASA Technical Reports Server (NTRS)
Gangal, M. D.; Isenberg, L.; Lewis, E. V.
1985-01-01
Proposed system offers safety and large return on investment. System, operating by year 2000, employs machines and processes based on proven principles. According to concept, line of parallel machines, connected in groups of four to service modules, attacks face of coal seam. High-pressure water jets and central auger on each machine break face. Jaws scoop up coal chunks, and auger grinds them and forces fragments into slurry-transport system. Slurry pumped through pipeline to point of use. Concept for highly automated coal-mining system increases productivity, makes mining safer, and protects health of mine workers.
A spherical parallel three degrees-of-freedom robot for ankle-foot neuro-rehabilitation.
Malosio, Matteo; Negri, Simone Pio; Pedrocchi, Nicola; Vicentini, Federico; Caimmi, Marco; Molinari Tosatti, Lorenzo
2012-01-01
The ankle represents a fairly complex bone structure, resulting in kinematics that hinders a flawless robot-assisted recovery of foot motility in impaired subjects. The paper proposes a novel device for ankle-foot neuro-rehabilitation based on a mechatronic redesign of the remarkable Agile Eye spherical robot on the basis of clinical requisites. The kinematic design allows the positioning of the ankle articular center close to the machine rotation center with valuable benefits in term of therapy functions. The prototype, named PKAnkle, Parallel Kinematic machine for Ankle rehabilitation, provides a 6-axes load cell for the measure of subject interaction forces/torques, and it integrates a commercial EMG-acquisition system. Robot control provides active and passive therapeutic exercises.
50 GFlops molecular dynamics on the Connection Machine 5
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lomdahl, P.S.; Tamayo, P.; Groenbech-Jensen, N.
1993-12-31
The authors present timings and performance numbers for a new short range three dimensional (3D) molecular dynamics (MD) code, SPaSM, on the Connection Machine-5 (CM-5). They demonstrate that runs with more than 10{sup 8} particles are now possible on massively parallel MIMD computers. To the best of their knowledge this is at least an order of magnitude more particles than what has previously been reported. Typical production runs show sustained performance (including communication) in the range of 47--50 GFlops on a 1024 node CM-5 with vector units (VUs). The speed of the code scales linearly with the number of processorsmore » and with the number of particles and shows 95% parallel efficiency in the speedup.« less
Implementation of the force decomposition machine for molecular dynamics simulations.
Borštnik, Urban; Miller, Benjamin T; Brooks, Bernard R; Janežič, Dušanka
2012-09-01
We present the design and implementation of the force decomposition machine (FDM), a cluster of personal computers (PCs) that is tailored to running molecular dynamics (MD) simulations using the distributed diagonal force decomposition (DDFD) parallelization method. The cluster interconnect architecture is optimized for the communication pattern of the DDFD method. Our implementation of the FDM relies on standard commodity components even for networking. Although the cluster is meant for DDFD MD simulations, it remains general enough for other parallel computations. An analysis of several MD simulation runs on both the FDM and a standard PC cluster demonstrates that the FDM's interconnect architecture provides a greater performance compared to a more general cluster interconnect. Copyright © 2012 Elsevier Inc. All rights reserved.
Better approximation guarantees for job-shop scheduling
DOE Office of Scientific and Technical Information (OSTI.GOV)
Goldberg, L.A.; Paterson, M.; Srinivasan, A.
1997-06-01
Job-shop scheduling is a classical NP-hard problem. Shmoys, Stein & Wein presented the first polynomial-time approximation algorithm for this problem that has a good (polylogarithmic) approximation guarantee. We improve the approximation guarantee of their work, and present further improvements for some important NP-hard special cases of this problem (e.g., in the preemptive case where machines can suspend work on operations and later resume). We also present NC algorithms with improved approximation guarantees for some NP-hard special cases.
The Use of the MASCOT Philosophy for the Construction of Ada Programs,
1983-10-01
dependent units must be recompiled. Because of Ada’s commitment to abstract data types tasks are treated as data types with certain restrictions. A task...3.3.3.1.4 End of Slice Action The scheduling algorithm determines, for each type of Slice termination, how the Scheduler treats Activities whose Slice has...Pools. The MASCOT Machine treats them as constructionally equivalent (refer 3.3.1.1.1). Because of the constraints brought in by the formulation of
Full glowworm swarm optimization algorithm for whole-set orders scheduling in single machine.
Yu, Zhang; Yang, Xiaomei
2013-01-01
By analyzing the characteristics of whole-set orders problem and combining the theory of glowworm swarm optimization, a new glowworm swarm optimization algorithm for scheduling is proposed. A new hybrid-encoding schema combining with two-dimensional encoding and random-key encoding is given. In order to enhance the capability of optimal searching and speed up the convergence rate, the dynamical changed step strategy is integrated into this algorithm. Furthermore, experimental results prove its feasibility and efficiency.
Parallel processing in finite element structural analysis
NASA Technical Reports Server (NTRS)
Noor, Ahmed K.
1987-01-01
A brief review is made of the fundamental concepts and basic issues of parallel processing. Discussion focuses on parallel numerical algorithms, performance evaluation of machines and algorithms, and parallelism in finite element computations. A computational strategy is proposed for maximizing the degree of parallelism at different levels of the finite element analysis process including: 1) formulation level (through the use of mixed finite element models); 2) analysis level (through additive decomposition of the different arrays in the governing equations into the contributions to a symmetrized response plus correction terms); 3) numerical algorithm level (through the use of operator splitting techniques and application of iterative processes); and 4) implementation level (through the effective combination of vectorization, multitasking and microtasking, whenever available).
O'keefe, Matthew; Parr, Terence; Edgar, B. Kevin; ...
1995-01-01
Massively parallel processors (MPPs) hold the promise of extremely high performance that, if realized, could be used to study problems of unprecedented size and complexity. One of the primary stumbling blocks to this promise has been the lack of tools to translate application codes to MPP form. In this article we show how applications codes written in a subset of Fortran 77, called Fortran-P, can be translated to achieve good performance on several massively parallel machines. This subset can express codes that are self-similar, where the algorithm applied to the global data domain is also applied to each subdomain. Wemore » have found many codes that match the Fortran-P programming style and have converted them using our tools. We believe a self-similar coding style will accomplish what a vectorizable style has accomplished for vector machines by allowing the construction of robust, user-friendly, automatic translation systems that increase programmer productivity and generate fast, efficient code for MPPs.« less
Pyramidal neurovision architecture for vision machines
NASA Astrophysics Data System (ADS)
Gupta, Madan M.; Knopf, George K.
1993-08-01
The vision system employed by an intelligent robot must be active; active in the sense that it must be capable of selectively acquiring the minimal amount of relevant information for a given task. An efficient active vision system architecture that is based loosely upon the parallel-hierarchical (pyramidal) structure of the biological visual pathway is presented in this paper. Although the computational architecture of the proposed pyramidal neuro-vision system is far less sophisticated than the architecture of the biological visual pathway, it does retain some essential features such as the converging multilayered structure of its biological counterpart. In terms of visual information processing, the neuro-vision system is constructed from a hierarchy of several interactive computational levels, whereupon each level contains one or more nonlinear parallel processors. Computationally efficient vision machines can be developed by utilizing both the parallel and serial information processing techniques within the pyramidal computing architecture. A computer simulation of a pyramidal vision system for active scene surveillance is presented.
Precision Parameter Estimation and Machine Learning
NASA Astrophysics Data System (ADS)
Wandelt, Benjamin D.
2008-12-01
I discuss the strategy of ``Acceleration by Parallel Precomputation and Learning'' (AP-PLe) that can vastly accelerate parameter estimation in high-dimensional parameter spaces and costly likelihood functions, using trivially parallel computing to speed up sequential exploration of parameter space. This strategy combines the power of distributed computing with machine learning and Markov-Chain Monte Carlo techniques efficiently to explore a likelihood function, posterior distribution or χ2-surface. This strategy is particularly successful in cases where computing the likelihood is costly and the number of parameters is moderate or large. We apply this technique to two central problems in cosmology: the solution of the cosmological parameter estimation problem with sufficient accuracy for the Planck data using PICo; and the detailed calculation of cosmological helium and hydrogen recombination with RICO. Since the APPLe approach is designed to be able to use massively parallel resources to speed up problems that are inherently serial, we can bring the power of distributed computing to bear on parameter estimation problems. We have demonstrated this with the CosmologyatHome project.
Parallel and Scalable Clustering and Classification for Big Data in Geosciences
NASA Astrophysics Data System (ADS)
Riedel, M.
2015-12-01
Machine learning, data mining, and statistical computing are common techniques to perform analysis in earth sciences. This contribution will focus on two concrete and widely used data analytics methods suitable to analyse 'big data' in the context of geoscience use cases: clustering and classification. From the broad class of available clustering methods we focus on the density-based spatial clustering of appliactions with noise (DBSCAN) algorithm that enables the identification of outliers or interesting anomalies. A new open source parallel and scalable DBSCAN implementation will be discussed in the light of a scientific use case that detects water mixing events in the Koljoefjords. The second technique we cover is classification, with a focus set on the support vector machines algorithm (SVMs), as one of the best out-of-the-box classification algorithm. A parallel and scalable SVM implementation will be discussed in the light of a scientific use case in the field of remote sensing with 52 different classes of land cover types.
Massively Parallel Dantzig-Wolfe Decomposition Applied to Traffic Flow Scheduling
NASA Technical Reports Server (NTRS)
Rios, Joseph Lucio; Ross, Kevin
2009-01-01
Optimal scheduling of air traffic over the entire National Airspace System is a computationally difficult task. To speed computation, Dantzig-Wolfe decomposition is applied to a known linear integer programming approach for assigning delays to flights. The optimization model is proven to have the block-angular structure necessary for Dantzig-Wolfe decomposition. The subproblems for this decomposition are solved in parallel via independent computation threads. Experimental evidence suggests that as the number of subproblems/threads increases (and their respective sizes decrease), the solution quality, convergence, and runtime improve. A demonstration of this is provided by using one flight per subproblem, which is the finest possible decomposition. This results in thousands of subproblems and associated computation threads. This massively parallel approach is compared to one with few threads and to standard (non-decomposed) approaches in terms of solution quality and runtime. Since this method generally provides a non-integral (relaxed) solution to the original optimization problem, two heuristics are developed to generate an integral solution. Dantzig-Wolfe followed by these heuristics can provide a near-optimal (sometimes optimal) solution to the original problem hundreds of times faster than standard (non-decomposed) approaches. In addition, when massive decomposition is employed, the solution is shown to be more likely integral, which obviates the need for an integerization step. These results indicate that nationwide, real-time, high fidelity, optimal traffic flow scheduling is achievable for (at least) 3 hour planning horizons.
Detecting opportunities for parallel observations on the Hubble Space Telescope
NASA Technical Reports Server (NTRS)
Lucks, Michael
1992-01-01
The presence of multiple scientific instruments aboard the Hubble Space Telescope provides opportunities for parallel science, i.e., the simultaneous use of different instruments for different observations. Determining whether candidate observations are suitable for parallel execution depends on numerous criteria (some involving quantitative tradeoffs) that may change frequently. A knowledge based approach is presented for constructing a scoring function to rank candidate pairs of observations for parallel science. In the Parallel Observation Matching System (POMS), spacecraft knowledge and schedulers' preferences are represented using a uniform set of mappings, or knowledge functions. Assessment of parallel science opportunities is achieved via composition of the knowledge functions in a prescribed manner. The knowledge acquisition, and explanation facilities of the system are presented. The methodology is applicable to many other multiple criteria assessment problems.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Johnson, J.R.; Netrologic, Inc., San Diego, CA)
1988-01-01
Topics presented include integrating neural networks and expert systems, neural networks and signal processing, machine learning, cognition and avionics applications, artificial intelligence and man-machine interface issues, real time expert systems, artificial intelligence, and engineering applications. Also considered are advanced problem solving techniques, combinational optimization for scheduling and resource control, data fusion/sensor fusion, back propagation with momentum, shared weights and recurrency, automatic target recognition, cybernetics, optical neural networks.
A Knowledge-Based Approach for Item Exposure Control in Computerized Adaptive Testing
ERIC Educational Resources Information Center
Doong, Shing H.
2009-01-01
The purpose of this study is to investigate a functional relation between item exposure parameters (IEPs) and item parameters (IPs) over parallel pools. This functional relation is approximated by a well-known tool in machine learning. Let P and Q be parallel item pools and suppose IEPs for P have been obtained via a Sympson and Hetter-type…
Constitutive Model Calibration via Autonomous Multiaxial Experimentation (Postprint)
2016-09-17
test machine. Experimental data is reduced and finite element simulations are conducted in parallel with the test based on experimental strain...data is reduced and finite element simulations are conducted in parallel with the test based on experimental strain conditions. Optimization methods...be used directly in finite element simulations of more complex geometries. Keywords Axial/torsional experimentation • Plasticity • Constitutive model
Automated Handling of Garments for Pressing
1991-09-30
Parallel Algorithms for 2D Kalman Filtering ................................. 47 DJ. Potter and M.P. Cline Hash Table and Sorted Array: A Case Study of... Kalman Filtering on the Connection Machine ............................ 55 MA. Palis and D.K. Krecker Parallel Sorting of Large Arrays on the MasPar...ALGORITHM’VS FOR SEAM SENSING. .. .. .. ... ... .... ..... 24 6.1 KarelTW Algorithms .. .. ... ... ... ... .... ... ...... 24 6.1.1 Image Filtering
30 CFR 75.209 - Automated Temporary Roof Support (ATRS) systems.
Code of Federal Regulations, 2011 CFR
2011-07-01
... paragraph shall be met according to the following schedule: (1) All new machines ordered after March 28... the left, right or beyond the ATRS system, shall not exceed 5 feet. (e) Each ATRS system shall meet...
30 CFR 75.209 - Automated Temporary Roof Support (ATRS) systems.
Code of Federal Regulations, 2010 CFR
2010-07-01
... paragraph shall be met according to the following schedule: (1) All new machines ordered after March 28... the left, right or beyond the ATRS system, shall not exceed 5 feet. (e) Each ATRS system shall meet...
Information Processing Research.
1988-05-01
concentrated mainly on the Hitech chess machine, which achieves its success from parallelism in the right places. Hitech has now reached a National rating...includes local user workstations, a set of central server workstations each acting as a host for a Warp machine, and a few Warp multiprocessors. The... successful completion. A quorum for an operation is any such set of sites. Neces- sary and sufficient constraints on quorum intersections are derived
Lattice-Gas Automata Fluids on Parallel Supercomputers
1993-11-23
Kelvin-Helmholtz shear instabil- ity, and the Von Karman vortex shedding instability. Performance of the two machines in terms of both site update... PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) Phillips Laboratory,Hanscom Field,MA,01731 8. PERFORMING ORGANIZATION REPORT NUMBER 9. SPONSORING...Helmholtz shear instability, and the Von Karman vortex shedding instability. Performance of the two machines in terms of both site update rate and
Development of a Dynamic Time Sharing Scheduled Environment Final Report CRADA No. TC-824-94E
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jette, M.; Caliga, D.
Massively parallel computers, such as the Cray T3D, have historically supported resource sharing solely with space sharing. In that method, multiple problems are solved by executing them on distinct processors. This project developed a dynamic time- and space-sharing scheduler to achieve greater interactivity and throughput than could be achieved with space-sharing alone. CRI and LLNL worked together on the design, testing, and review aspects of this project. There were separate software deliverables. CFU implemented a general purpose scheduling system as per the design specifications. LLNL ported the local gang scheduler software to the LLNL Cray T3D. In this approach, processorsmore » are allocated simultaneously to aU components of a parallel program (in a “gang”). Program execution is preempted as needed to provide for interactivity. Programs are also reIocated to different processors as needed to efficiently pack the computer’s torus of processors. In phase one, CRI developed an interface specification after discussions with LLNL for systemlevel software supporting a time- and space-sharing environment on the LLNL T3D. The two parties also discussed interface specifications for external control tools (such as scheduling policy tools, system administration tools) and applications programs. CRI assumed responsibility for the writing and implementation of all the necessary system software in this phase. In phase two, CRI implemented job-rolling on the Cray T3D, a mechanism for preempting a program, saving its state to disk, and later restoring its state to memory for continued execution. LLNL ported its gang scheduler to the LLNL T3D utilizing the CRI interface implemented in phases one and two. During phase three, the functionality and effectiveness of the LLNL gang scheduler was assessed to provide input to CRI time- and space-sharing, efforts. CRI will utilize this information in the development of general schedulers suitable for other sites and future architectures.« less
Parallel and Distributed Systems for Probabilistic Reasoning
2012-12-01
work at CMU I had the opportunity to work with Andreas Krause on Gaussian process models for signal quality estimation in wireless sensor networks ...we reviewed the natural parallelization of the belief propagation algorithm using the synchronous schedule and demonstrated both theoretically and...problem is that the power-law sparsity structure, commonly found in graphs derived from natural phenomena (e.g., social networks and the web
Compiler and Runtime Support for Programming in Adaptive Parallel Environments
1998-10-15
noother job is waiting for resources, and use a smaller number of processors when other jobs needresources. Setia et al. [15, 20] have shown that such...15] Vijay K. Naik, Sanjeev Setia , and Mark Squillante. Performance analysis of job scheduling policiesin parallel supercomputing environments. In...on networks ofheterogeneous workstations. Technical Report CSE-94-012, Oregon Graduate Institute of Scienceand Technology, 1994.[20] Sanjeev Setia
PRAIS: Distributed, real-time knowledge-based systems made easy
NASA Technical Reports Server (NTRS)
Goldstein, David G.
1990-01-01
This paper discusses an architecture for real-time, distributed (parallel) knowledge-based systems called the Parallel Real-time Artificial Intelligence System (PRAIS). PRAIS strives for transparently parallelizing production (rule-based) systems, even when under real-time constraints. PRAIS accomplishes these goals by incorporating a dynamic task scheduler, operating system extensions for fact handling, and message-passing among multiple copies of CLIPS executing on a virtual blackboard. This distributed knowledge-based system tool uses the portability of CLIPS and common message-passing protocols to operate over a heterogeneous network of processors.
Computer-aided programming for message-passing system; Problems and a solution
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wu, M.Y.; Gajski, D.D.
1989-12-01
As the number of processors and the complexity of problems to be solved increase, programming multiprocessing systems becomes more difficult and error-prone. Program development tools are necessary since programmers are not able to develop complex parallel programs efficiently. Parallel models of computation, parallelization problems, and tools for computer-aided programming (CAP) are discussed. As an example, a CAP tool that performs scheduling and inserts communication primitives automatically is described. It also generates the performance estimates and other program quality measures to help programmers in improving their algorithms and programs.
Real-Time MENTAT programming language and architecture
NASA Technical Reports Server (NTRS)
Grimshaw, Andrew S.; Silberman, Ami; Liu, Jane W. S.
1989-01-01
Real-time MENTAT, a programming environment designed to simplify the task of programming real-time applications in distributed and parallel environments, is described. It is based on the same data-driven computation model and object-oriented programming paradigm as MENTAT. It provides an easy-to-use mechanism to exploit parallelism, language constructs for the expression and enforcement of timing constraints, and run-time support for scheduling and exciting real-time programs. The real-time MENTAT programming language is an extended C++. The extensions are added to facilitate automatic detection of data flow and generation of data flow graphs, to express the timing constraints of individual granules of computation, and to provide scheduling directives for the runtime system. A high-level view of the real-time MENTAT system architecture and programming language constructs is provided.
JIGSAW: Preference-directed, co-operative scheduling
NASA Technical Reports Server (NTRS)
Linden, Theodore A.; Gaw, David
1992-01-01
Techniques that enable humans and machines to cooperate in the solution of complex scheduling problems have evolved out of work on the daily allocation and scheduling of Tactical Air Force resources. A generalized, formal model of these applied techniques is being developed. It is called JIGSAW by analogy with the multi-agent, constructive process used when solving jigsaw puzzles. JIGSAW begins from this analogy and extends it by propagating local preferences into global statistics that dynamically influence the value and variable ordering decisions. The statistical projections also apply to abstract resources and time periods--allowing more opportunities to find a successful variable ordering by reserving abstract resources and deferring the choice of a specific resource or time period.
Workflow as a Service in the Cloud: Architecture and Scheduling Algorithms.
Wang, Jianwu; Korambath, Prakashan; Altintas, Ilkay; Davis, Jim; Crawl, Daniel
2014-01-01
With more and more workflow systems adopting cloud as their execution environment, it becomes increasingly challenging on how to efficiently manage various workflows, virtual machines (VMs) and workflow execution on VM instances. To make the system scalable and easy-to-extend, we design a Workflow as a Service (WFaaS) architecture with independent services. A core part of the architecture is how to efficiently respond continuous workflow requests from users and schedule their executions in the cloud. Based on different targets, we propose four heuristic workflow scheduling algorithms for the WFaaS architecture, and analyze the differences and best usages of the algorithms in terms of performance, cost and the price/performance ratio via experimental studies.
DORCA computer program. Volume 1: User's guide
NASA Technical Reports Server (NTRS)
Wray, S. T., Jr.
1971-01-01
The Dynamic Operational Requirements and Cost Analysis Program (DORCA) was written to provide a top level analysis tool for NASA. DORCA relies on a man-machine interaction to optimize results based on external criteria. DORCA relies heavily on outside sources to provide cost information and vehicle performance parameters as the program does not determine these quantities but rather uses them. Given data describing missions, vehicles, payloads, containers, space facilities, schedules, cost values and costing procedures, the program computes flight schedules, cargo manifests, vehicle fleet requirements, acquisition schedules and cost summaries. The program is designed to consider the Earth Orbit, Lunar, Interplanetary and Automated Satellite Programs. A general outline of the capabilities of the program are provided.
The BLAZE language: A parallel language for scientific programming
NASA Technical Reports Server (NTRS)
Mehrotra, P.; Vanrosendale, J.
1985-01-01
A Pascal-like scientific programming language, Blaze, is described. Blaze contains array arithmetic, forall loops, and APL-style accumulation operators, which allow natural expression of fine grained parallelism. It also employs an applicative or functional procedure invocation mechanism, which makes it easy for compilers to extract coarse grained parallelism using machine specific program restructuring. Thus Blaze should allow one to achieve highly parallel execution on multiprocessor architectures, while still providing the user with onceptually sequential control flow. A central goal in the design of Blaze is portability across a broad range of parallel architectures. The multiple levels of parallelism present in Blaze code, in principle, allow a compiler to extract the types of parallelism appropriate for the given architecture while neglecting the remainder. The features of Blaze are described and shows how this language would be used in typical scientific programming.
A Solution Method of Job-shop Scheduling Problems by the Idle Time Shortening Type Genetic Algorithm
NASA Astrophysics Data System (ADS)
Ida, Kenichi; Osawa, Akira
In this paper, we propose a new idle time shortening method for Job-shop scheduling problems (JSPs). We insert its method into a genetic algorithm (GA). The purpose of JSP is to find a schedule with the minimum makespan. We suppose that it is effective to reduce idle time of a machine in order to improve the makespan. The left shift is a famous algorithm in existing algorithms for shortening idle time. The left shift can not arrange the work to idle time. For that reason, some idle times are not shortened by the left shift. We propose two kinds of algorithms which shorten such idle time. Next, we combine these algorithms and the reversal of a schedule. We apply GA with its algorithm to benchmark problems and we show its effectiveness.
Implementing Molecular Dynamics for Hybrid High Performance Computers - 1. Short Range Forces
DOE Office of Scientific and Technical Information (OSTI.GOV)
Brown, W Michael; Wang, Peng; Plimpton, Steven J
The use of accelerators such as general-purpose graphics processing units (GPGPUs) have become popular in scientific computing applications due to their low cost, impressive floating-point capabilities, high memory bandwidth, and low electrical power requirements. Hybrid high performance computers, machines with more than one type of floating-point processor, are now becoming more prevalent due to these advantages. In this work, we discuss several important issues in porting a large molecular dynamics code for use on parallel hybrid machines - 1) choosing a hybrid parallel decomposition that works on central processing units (CPUs) with distributed memory and accelerator cores with shared memory,more » 2) minimizing the amount of code that must be ported for efficient acceleration, 3) utilizing the available processing power from both many-core CPUs and accelerators, and 4) choosing a programming model for acceleration. We present our solution to each of these issues for short-range force calculation in the molecular dynamics package LAMMPS. We describe algorithms for efficient short range force calculation on hybrid high performance machines. We describe a new approach for dynamic load balancing of work between CPU and accelerator cores. We describe the Geryon library that allows a single code to compile with both CUDA and OpenCL for use on a variety of accelerators. Finally, we present results on a parallel test cluster containing 32 Fermi GPGPUs and 180 CPU cores.« less
A parallel Jacobson-Oksman optimization algorithm. [parallel processing (computers)
NASA Technical Reports Server (NTRS)
Straeter, T. A.; Markos, A. T.
1975-01-01
A gradient-dependent optimization technique which exploits the vector-streaming or parallel-computing capabilities of some modern computers is presented. The algorithm, derived by assuming that the function to be minimized is homogeneous, is a modification of the Jacobson-Oksman serial minimization method. In addition to describing the algorithm, conditions insuring the convergence of the iterates of the algorithm and the results of numerical experiments on a group of sample test functions are presented. The results of these experiments indicate that this algorithm will solve optimization problems in less computing time than conventional serial methods on machines having vector-streaming or parallel-computing capabilities.
NASA Technical Reports Server (NTRS)
Hribar, Michelle R.; Frumkin, Michael; Jin, Haoqiang; Waheed, Abdul; Yan, Jerry; Saini, Subhash (Technical Monitor)
1998-01-01
Over the past decade, high performance computing has evolved rapidly; systems based on commodity microprocessors have been introduced in quick succession from at least seven vendors/families. Porting codes to every new architecture is a difficult problem; in particular, here at NASA, there are many large CFD applications that are very costly to port to new machines by hand. The LCM ("Legacy Code Modernization") Project is the development of an integrated parallelization environment (IPE) which performs the automated mapping of legacy CFD (Fortran) applications to state-of-the-art high performance computers. While most projects to port codes focus on the parallelization of the code, we consider porting to be an iterative process consisting of several steps: 1) code cleanup, 2) serial optimization,3) parallelization, 4) performance monitoring and visualization, 5) intelligent tools for automated tuning using performance prediction and 6) machine specific optimization. The approach for building this parallelization environment is to build the components for each of the steps simultaneously and then integrate them together. The demonstration will exhibit our latest research in building this environment: 1. Parallelizing tools and compiler evaluation. 2. Code cleanup and serial optimization using automated scripts 3. Development of a code generator for performance prediction 4. Automated partitioning 5. Automated insertion of directives. These demonstrations will exhibit the effectiveness of an automated approach for all the steps involved with porting and tuning a legacy code application for a new architecture.
A Parameter Communication Optimization Strategy for Distributed Machine Learning in Sensors.
Zhang, Jilin; Tu, Hangdi; Ren, Yongjian; Wan, Jian; Zhou, Li; Li, Mingwei; Wang, Jue; Yu, Lifeng; Zhao, Chang; Zhang, Lei
2017-09-21
In order to utilize the distributed characteristic of sensors, distributed machine learning has become the mainstream approach, but the different computing capability of sensors and network delays greatly influence the accuracy and the convergence rate of the machine learning model. Our paper describes a reasonable parameter communication optimization strategy to balance the training overhead and the communication overhead. We extend the fault tolerance of iterative-convergent machine learning algorithms and propose the Dynamic Finite Fault Tolerance (DFFT). Based on the DFFT, we implement a parameter communication optimization strategy for distributed machine learning, named Dynamic Synchronous Parallel Strategy (DSP), which uses the performance monitoring model to dynamically adjust the parameter synchronization strategy between worker nodes and the Parameter Server (PS). This strategy makes full use of the computing power of each sensor, ensures the accuracy of the machine learning model, and avoids the situation that the model training is disturbed by any tasks unrelated to the sensors.
Parallelized direct execution simulation of message-passing parallel programs
NASA Technical Reports Server (NTRS)
Dickens, Phillip M.; Heidelberger, Philip; Nicol, David M.
1994-01-01
As massively parallel computers proliferate, there is growing interest in findings ways by which performance of massively parallel codes can be efficiently predicted. This problem arises in diverse contexts such as parallelizing computers, parallel performance monitoring, and parallel algorithm development. In this paper we describe one solution where one directly executes the application code, but uses a discrete-event simulator to model details of the presumed parallel machine such as operating system and communication network behavior. Because this approach is computationally expensive, we are interested in its own parallelization specifically the parallelization of the discrete-event simulator. We describe methods suitable for parallelized direct execution simulation of message-passing parallel programs, and report on the performance of such a system, Large Application Parallel Simulation Environment (LAPSE), we have built on the Intel Paragon. On all codes measured to date, LAPSE predicts performance well typically within 10 percent relative error. Depending on the nature of the application code, we have observed low slowdowns (relative to natively executing code) and high relative speedups using up to 64 processors.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Randolph, Randall Blaine; Oertel, John A.; Schmidt, Derek William
For this study, machined CH hemi-shell ablator capsules have been successfully produced by the MST-7 Target Fabrication Team at Los Alamos National Laboratory. Process development and micro-machining techniques have been developed to produce capsules for both the Omega and National Ignition Facility (NIF) campaigns. These capsules are gas filled up to 10 atm and consist of a machined plastic hemi-shell outer layer that accommodates various specially engineered low-density polystyrene foam cores. Machining and assembly of the two-part, step-jointed plastic hemi-shell outer layer required development of new techniques, processes, and tooling while still meeting very aggressive shot schedules for both campaigns.more » Finally, problems encountered and process improvements will be discussed that describe this very unique, complex capsule design approach through the first Omega proof-of-concept version to the larger NIF version.« less
Evaluating SPLASH-2 Applications Using MapReduce
NASA Astrophysics Data System (ADS)
Zhu, Shengkai; Xiao, Zhiwei; Chen, Haibo; Chen, Rong; Zhang, Weihua; Zang, Binyu
MapReduce has been prevalent for running data-parallel applications. By hiding other non-functionality parts such as parallelism, fault tolerance and load balance from programmers, MapReduce significantly simplifies the programming of large clusters. Due to the mentioned features of MapReduce above, researchers have also explored the use of MapReduce on other application domains, such as machine learning, textual retrieval and statistical translation, among others.
Automatic Adaptation of Tunable Distributed Applications
2001-01-01
size, weight, and battery life, with a single CPU, less memory, smaller hard disk, and lower bandwidth network connectivity. The power of PDAs is...wireless, and bluetooth [32] facilities; thus achieving different rates of data transmission. 1 With the trend of “write once, run everywhere...applications, a single component can execute on multiple processors (or machines) in parallel. These parallel applications, written in a specialized language
Effective switching frequency multiplier inverter
Su, Gui-Jia [Oak Ridge, TN; Peng, Fang Z [Okemos, MI
2007-08-07
A switching frequency multiplier inverter for low inductance machines that uses parallel connection of switches and each switch is independently controlled according to a pulse width modulation scheme. The effective switching frequency is multiplied by the number of switches connected in parallel while each individual switch operates within its limit of switching frequency. This technique can also be used for other power converters such as DC/DC, AC/DC converters.
Distributed communications and control network for robotic mining
NASA Technical Reports Server (NTRS)
Schiffbauer, William H.
1989-01-01
The application of robotics to coal mining machines is one approach pursued to increase productivity while providing enhanced safety for the coal miner. Toward that end, a network composed of microcontrollers, computers, expert systems, real time operating systems, and a variety of program languages are being integrated that will act as the backbone for intelligent machine operation. Actual mining machines, including a few customized ones, have been given telerobotic semiautonomous capabilities by applying the described network. Control devices, intelligent sensors and computers onboard these machines are showing promise of achieving improved mining productivity and safety benefits. Current research using these machines involves navigation, multiple machine interaction, machine diagnostics, mineral detection, and graphical machine representation. Guidance sensors and systems employed include: sonar, laser rangers, gyroscopes, magnetometers, clinometers, and accelerometers. Information on the network of hardware/software and its implementation on mining machines are presented. Anticipated coal production operations using the network are discussed. A parallelism is also drawn between the direction of present day underground coal mining research to how the lunar soil (regolith) may be mined. A conceptual lunar mining operation that employs a distributed communication and control network is detailed.
Games in the Brain: Neural Substrates of Gambling Addiction.
Murch, W Spencer; Clark, Luke
2016-10-01
As a popular form of recreational risk taking, gambling games offer a paradigm for decision neuroscience research. As an individual behavior, gambling becomes dysfunctional in a subset of the population, with debilitating consequences. Gambling disorder has been recently reconceptualized as a "behavioral addiction" in the DSM-5, based on emerging parallels with substance use disorders. Why do some individuals undergo this transition from recreational to disordered gambling? The biomedical model of problem gambling is a "brain disorder" account that posits an underlying neurobiological abnormality. This article first delineates the neural circuitry that underpins gambling-related decision making, comprising ventral striatum, ventromedial prefrontal cortex, dopaminergic midbrain, and insula, and presents evidence for pathophysiology in this circuitry in gambling disorder. These biological dispositions become translated into clinical disorder through the effects of gambling games. This influence is better articulated in a public health approach that describes the interplay between the player and the (gambling) product. Certain forms of gambling, including electronic gambling machines, appear to be overrepresented in problem gamblers. These games harness psychological features, including variable ratio schedules, near-misses, "losses disguised as wins," and the illusion of control, which modulate the core decision-making circuitry that is perturbed in gambling disorder. © The Author(s) 2015.
Implementing Shared Memory Parallelism in MCBEND
NASA Astrophysics Data System (ADS)
Bird, Adam; Long, David; Dobson, Geoff
2017-09-01
MCBEND is a general purpose radiation transport Monte Carlo code from AMEC Foster Wheelers's ANSWERS® Software Service. MCBEND is well established in the UK shielding community for radiation shielding and dosimetry assessments. The existing MCBEND parallel capability effectively involves running the same calculation on many processors. This works very well except when the memory requirements of a model restrict the number of instances of a calculation that will fit on a machine. To more effectively utilise parallel hardware OpenMP has been used to implement shared memory parallelism in MCBEND. This paper describes the reasoning behind the choice of OpenMP, notes some of the challenges of multi-threading an established code such as MCBEND and assesses the performance of the parallel method implemented in MCBEND.
A framework for grand scale parallelization of the combined finite discrete element method in 2d
NASA Astrophysics Data System (ADS)
Lei, Z.; Rougier, E.; Knight, E. E.; Munjiza, A.
2014-09-01
Within the context of rock mechanics, the Combined Finite-Discrete Element Method (FDEM) has been applied to many complex industrial problems such as block caving, deep mining techniques (tunneling, pillar strength, etc.), rock blasting, seismic wave propagation, packing problems, dam stability, rock slope stability, rock mass strength characterization problems, etc. The reality is that most of these were accomplished in a 2D and/or single processor realm. In this work a hardware independent FDEM parallelization framework has been developed using the Virtual Parallel Machine for FDEM, (V-FDEM). With V-FDEM, a parallel FDEM software can be adapted to different parallel architecture systems ranging from just a few to thousands of cores.
Limpanuparb, Taweetham; Milthorpe, Josh; Rendell, Alistair P
2014-10-30
Use of the modern parallel programming language X10 for computing long-range Coulomb and exchange interactions is presented. By using X10, a partitioned global address space language with support for task parallelism and the explicit representation of data locality, the resolution of the Ewald operator can be parallelized in a straightforward manner including use of both intranode and internode parallelism. We evaluate four different schemes for dynamic load balancing of integral calculation using X10's work stealing runtime, and report performance results for long-range HF energy calculation of large molecule/high quality basis running on up to 1024 cores of a high performance cluster machine. Copyright © 2014 Wiley Periodicals, Inc.
Information Processing Research
1988-01-01
the Hitech chess machine, which achieves its success from parallelism in the right places. Hitech has now reached a National rating of 2359, making it...outset that success depended on building real systems and subjecting them to use by a large number of faculty and students within the Department. We...central server workstations each acting as a host for a Warp machine, and a few Warp multiprocessors. The command interpreter is executed in Lisp on
Center for Parallel Optimization.
1996-03-19
A NEW OPTIMIZATION BASED APPROACH TO IMPROVING GENERALIZATION IN MACHINE LEARNING HAS BEEN PROPOSED AND COMPUTATIONALLY VALIDATED ON SIMPLE LINEAR MODELS AS WELL AS ON HIGHLY NONLINEAR SYSTEMS SUCH AS NEURAL NETWORKS.
Improvement of the COP of the LiBr-Water Double-Effect Absorption Cycles
NASA Astrophysics Data System (ADS)
Shitara, Atsushi
Prevention of the global warming has called for a great necessity for energy saving. This applies to the improvement of the COP of absorption chiller-heaters. We started the development of the high efficiency gas-fired double-effect absorption chiller-heater using LiBr-H2O to achieve target performance in short or middle term. To maintain marketability, the volume of the high efficiency machine has been set below the equal to the conventional machine. The absorption cycle technology for improving the COP and the element technology for downsizing the machine is necessary in this development. In this study, the former is investigated. In this report, first of all the target performance has been set at cooling COP of 1.35(on HHV), which is 0.35 higher than the COP of 1.0 for conventional machines in the market. This COP of 1.35 is practically close to the maximum limit achievable by double-effect absorption chiller-heater. Next, the design condition of each element to achieve the target performance and the effect of each mean to improve the COP are investigated. Moreover, as a result of comparing the various flows(series, parallel, reverse)to which the each mean is applied, it has been found the optimum cycle is the parallel flow.
Performance Evaluation and Modeling Techniques for Parallel Processors. Ph.D. Thesis
NASA Technical Reports Server (NTRS)
Dimpsey, Robert Tod
1992-01-01
In practice, the performance evaluation of supercomputers is still substantially driven by singlepoint estimates of metrics (e.g., MFLOPS) obtained by running characteristic benchmarks or workloads. With the rapid increase in the use of time-shared multiprogramming in these systems, such measurements are clearly inadequate. This is because multiprogramming and system overhead, as well as other degradations in performance due to time varying characteristics of workloads, are not taken into account. In multiprogrammed environments, multiple jobs and users can dramatically increase the amount of system overhead and degrade the performance of the machine. Performance techniques, such as benchmarking, which characterize performance on a dedicated machine ignore this major component of true computer performance. Due to the complexity of analysis, there has been little work done in analyzing, modeling, and predicting the performance of applications in multiprogrammed environments. This is especially true for parallel processors, where the costs and benefits of multi-user workloads are exacerbated. While some may claim that the issue of multiprogramming is not a viable one in the supercomputer market, experience shows otherwise. Even in recent massively parallel machines, multiprogramming is a key component. It has even been claimed that a partial cause of the demise of the CM2 was the fact that it did not efficiently support time-sharing. In the same paper, Gordon Bell postulates that, multicomputers will evolve to multiprocessors in order to support efficient multiprogramming. Therefore, it is clear that parallel processors of the future will be required to offer the user a time-shared environment with reasonable response times for the applications. In this type of environment, the most important performance metric is the completion of response time of a given application. However, there are a few evaluation efforts addressing this issue.
A parallel computing engine for a class of time critical processes.
Nabhan, T M; Zomaya, A Y
1997-01-01
This paper focuses on the efficient parallel implementation of systems of numerically intensive nature over loosely coupled multiprocessor architectures. These analytical models are of significant importance to many real-time systems that have to meet severe time constants. A parallel computing engine (PCE) has been developed in this work for the efficient simplification and the near optimal scheduling of numerical models over the different cooperating processors of the parallel computer. First, the analytical system is efficiently coded in its general form. The model is then simplified by using any available information (e.g., constant parameters). A task graph representing the interconnections among the different components (or equations) is generated. The graph can then be compressed to control the computation/communication requirements. The task scheduler employs a graph-based iterative scheme, based on the simulated annealing algorithm, to map the vertices of the task graph onto a Multiple-Instruction-stream Multiple-Data-stream (MIMD) type of architecture. The algorithm uses a nonanalytical cost function that properly considers the computation capability of the processors, the network topology, the communication time, and congestion possibilities. Moreover, the proposed technique is simple, flexible, and computationally viable. The efficiency of the algorithm is demonstrated by two case studies with good results.
A Parallel Compact Multi-Dimensional Numerical Algorithm with Aeroacoustics Applications
NASA Technical Reports Server (NTRS)
Povitsky, Alex; Morris, Philip J.
1999-01-01
In this study we propose a novel method to parallelize high-order compact numerical algorithms for the solution of three-dimensional PDEs (Partial Differential Equations) in a space-time domain. For this numerical integration most of the computer time is spent in computation of spatial derivatives at each stage of the Runge-Kutta temporal update. The most efficient direct method to compute spatial derivatives on a serial computer is a version of Gaussian elimination for narrow linear banded systems known as the Thomas algorithm. In a straightforward pipelined implementation of the Thomas algorithm processors are idle due to the forward and backward recurrences of the Thomas algorithm. To utilize processors during this time, we propose to use them for either non-local data independent computations, solving lines in the next spatial direction, or local data-dependent computations by the Runge-Kutta method. To achieve this goal, control of processor communication and computations by a static schedule is adopted. Thus, our parallel code is driven by a communication and computation schedule instead of the usual "creative, programming" approach. The obtained parallelization speed-up of the novel algorithm is about twice as much as that for the standard pipelined algorithm and close to that for the explicit DRP algorithm.
Design of robotic cells based on relative handling modules with use of SolidWorks system
NASA Astrophysics Data System (ADS)
Gaponenko, E. V.; Anciferov, S. I.
2018-05-01
The article presents a diagramed engineering solution for a robotic cell with six degrees of freedom for machining of complex details, consisting of the base with a tool installation module and a detail machining module made as parallel structure mechanisms. The output links of the detail machining module and the tool installation module can move along X-Y-Z coordinate axes each. A 3D-model of the complex is designed in the SolidWorks system. It will be used further for carrying out engineering calculations and mathematical analysis and obtaining all required documentation.
Code of Federal Regulations, 2014 CFR
2014-07-01
... operating duplicating machinery. Not included in direct costs are overhead expenses such as costs of space... form of paper copy, microform, audio-visual materials, or machine-readable documentation (e.g... programs of scholarly research. (5) Non-commercial scientific institution means an institution that is not...
Code of Federal Regulations, 2012 CFR
2012-07-01
... operating duplicating machinery. Not included in direct costs are overhead expenses such as costs of space... form of paper copy, microform, audio-visual materials, or machine-readable documentation (e.g... programs of scholarly research. (5) Non-commercial scientific institution means an institution that is not...
Rural Renaissance. Revitalizing Small High Schools.
ERIC Educational Resources Information Center
Ford, Edmund A.
Written in 1961, this document presents the rationales and applications of what were and still are, in most instances, considered innovative practices. Subjects discussed are building designs, teaching machines, educational television, flexible scheduling, multiple classes and small-group techniques, teacher assistants, shared services, and…
A Study on Real-Time Scheduling Methods in Holonic Manufacturing Systems
NASA Astrophysics Data System (ADS)
Iwamura, Koji; Taimizu, Yoshitaka; Sugimura, Nobuhiro
Recently, new architectures of manufacturing systems have been proposed to realize flexible control structures of the manufacturing systems, which can cope with the dynamic changes in the volume and the variety of the products and also the unforeseen disruptions, such as failures of manufacturing resources and interruptions by high priority jobs. They are so called as the autonomous distributed manufacturing system, the biological manufacturing system and the holonic manufacturing system. Rule-based scheduling methods were proposed and applied to the real-time production scheduling problems of the HMS (Holonic Manufacturing System) in the previous report. However, there are still remaining problems from the viewpoint of the optimization of the whole production schedules. New procedures are proposed, in the present paper, to select the production schedules, aimed at generating effective production schedules in real-time. The proposed methods enable the individual holons to select suitable machining operations to be carried out in the next time period. Coordination process among the holons is also proposed to carry out the coordination based on the effectiveness values of the individual holons.
Srivastava, Shubhika; Allada, Vivekanand; Younoszai, Adel; Lopez, Leo; Soriano, Brian D; Fleishman, Craig E; Van Hoever, Andrea M; Lai, Wyman W
2016-10-01
The American Society of Echocardiography Committee on Pediatric Echocardiography Laboratory Productivity aimed to study factors that could influence the clinical productivity of physicians and sonographers and assess longitudinal trends for the same. The first survey results indicated that productivity correlated with the total volume of echocardiograms. Survey questions were designed to assess productivity for (1) physician full-time equivalent (FTE) allocated to echocardiography reading (echocardiograms per physician FTE per day), (2) sonographer FTE (echocardiograms per sonographer FTE per year), and (3) machine utilization (echocardiograms per machine per year). Questions were also posed to assess work flow and workforce. For fiscal year 2013 or academic year 2012-2013, the mean number of total echocardiograms-including outreach, transthoracic, fetal, and transesophageal echocardiograms-per physician FTE per day was 14.3 ± 5.9, the mean number of echocardiograms per sonographer FTE per year was 1,056 ± 441, and the mean number of echocardiograms per machine per year was 778 ± 303. Both physician and sonographer productivity was higher at high-volume surgical centers and with echocardiography slots scheduled concordantly with clinic visits. Having an advanced imaging fellow and outpatient sedation correlated negatively with clinical laboratory productivity. Machine utilization was greater in laboratories with higher sonographer and physician productivity and lower for machines obtained before 2009. Measures of pediatric echocardiography laboratory staff productivity and machine utilization were shown to correlate positively with surgical volume, total echocardiography volumes, and concordant echocardiography scheduling; the same measures correlated negatively with having an advanced imaging fellow and outpatient sedation. There has been no significant change in staff productivity noted over two Committee on Pediatric Echocardiography Laboratory Productivity survey cycles, suggesting that hiring practices have matched laboratory volume increases. Copyright © 2016 American Society of Echocardiography. Published by Elsevier Inc. All rights reserved.
Self-balanced modulation and magnetic rebalancing method for parallel multilevel inverters
DOE Office of Scientific and Technical Information (OSTI.GOV)
Li, Hui; Shi, Yanjun
A self-balanced modulation method and a closed-loop magnetic flux rebalancing control method for parallel multilevel inverters. The combination of the two methods provides for balancing of the magnetic flux of the inter-cell transformers (ICTs) of the parallel multilevel inverters without deteriorating the quality of the output voltage. In various embodiments a parallel multi-level inverter modulator is provide including a multi-channel comparator to generate a multiplexed digitized ideal waveform for a parallel multi-level inverter and a finite state machine (FSM) module coupled to the parallel multi-channel comparator, the FSM module to receive the multiplexed digitized ideal waveform and to generate amore » pulse width modulated gate-drive signal for each switching device of the parallel multi-level inverter. The system and method provides for optimization of the output voltage spectrum without influence the magnetic balancing.« less
The BLAZE language - A parallel language for scientific programming
NASA Technical Reports Server (NTRS)
Mehrotra, Piyush; Van Rosendale, John
1987-01-01
A Pascal-like scientific programming language, BLAZE, is described. BLAZE contains array arithmetic, forall loops, and APL-style accumulation operators, which allow natural expression of fine grained parallelism. It also employs an applicative or functional procedure invocation mechanism, which makes it easy for compilers to extract coarse grained parallelism using machine specific program restructuring. Thus BLAZE should allow one to achieve highly parallel execution on multiprocessor architectures, while still providing the user with conceptually sequential control flow. A central goal in the design of BLAZE is portability across a broad range of parallel architectures. The multiple levels of parallelism present in BLAZE code, in principle, allow a compiler to extract the types of parallelism appropriate for the given architecture while neglecting the remainder. The features of BLAZE are described and it is shown how this language would be used in typical scientific programming.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wayne F. Boyer; Gurdeep S. Hura
2005-09-01
The Problem of obtaining an optimal matching and scheduling of interdependent tasks in distributed heterogeneous computing (DHC) environments is well known to be an NP-hard problem. In a DHC system, task execution time is dependent on the machine to which it is assigned and task precedence constraints are represented by a directed acyclic graph. Recent research in evolutionary techniques has shown that genetic algorithms usually obtain more efficient schedules that other known algorithms. We propose a non-evolutionary random scheduling (RS) algorithm for efficient matching and scheduling of inter-dependent tasks in a DHC system. RS is a succession of randomized taskmore » orderings and a heuristic mapping from task order to schedule. Randomized task ordering is effectively a topological sort where the outcome may be any possible task order for which the task precedent constraints are maintained. A detailed comparison to existing evolutionary techniques (GA and PSGA) shows the proposed algorithm is less complex than evolutionary techniques, computes schedules in less time, requires less memory and fewer tuning parameters. Simulation results show that the average schedules produced by RS are approximately as efficient as PSGA schedules for all cases studied and clearly more efficient than PSGA for certain cases. The standard formulation for the scheduling problem addressed in this paper is Rm|prec|Cmax.,« less
NASA Astrophysics Data System (ADS)
Iwamura, Koji; Kuwahara, Shinya; Tanimizu, Yoshitaka; Sugimura, Nobuhiro
Recently, new distributed architectures of manufacturing systems are proposed, aiming at realizing more flexible control structures of the manufacturing systems. Many researches have been carried out to deal with the distributed architectures for planning and control of the manufacturing systems. However, the human operators have not yet been discussed for the autonomous components of the distributed manufacturing systems. A real-time scheduling method is proposed, in this research, to select suitable combinations of the human operators, the resources and the jobs for the manufacturing processes. The proposed scheduling method consists of following three steps. In the first step, the human operators select their favorite manufacturing processes which they will carry out in the next time period, based on their preferences. In the second step, the machine tools and the jobs select suitable combinations for the next machining processes. In the third step, the automated guided vehicles and the jobs select suitable combinations for the next transportation processes. The second and third steps are carried out by using the utility value based method and the dispatching rule-based method proposed in the previous researches. Some case studies have been carried out to verify the effectiveness of the proposed method.
Scheduling revisited workstations in integrated-circuit fabrication
NASA Technical Reports Server (NTRS)
Kline, Paul J.
1992-01-01
The cost of building new semiconductor wafer fabrication factories has grown rapidly, and a state-of-the-art fab may cost 250 million dollars or more. Obtaining an acceptable return on this investment requires high productivity from the fabrication facilities. This paper describes the Photo Dispatcher system which was developed to make machine-loading recommendations on a set of key fab machines. Dispatching policies that generally perform well in job shops (e.g., Shortest Remaining Processing Time) perform poorly for workstations such as photolithography which are visited several times by the same lot of silicon wafers. The Photo Dispatcher evaluates the history of workloads throughout the fab and identifies bottleneck areas. The scheduler then assigns priorities to lots depending on where they are headed after photolithography. These priorities are designed to avoid starving bottleneck workstations and to give preference to lots that are headed to areas where they can be processed with minimal waiting. Other factors considered by the scheduler to establish priorities are the nearness of a lot to the end of its process flow and the time that the lot has already been waiting in queue. Simulations that model the equipment and products in one of Texas Instrument's wafer fabs show the Photo Dispatcher can produce a 10 percent improvement in the time required to fabricate integrated circuits.
Deng, Qianwang; Gong, Guiliang; Gong, Xuran; Zhang, Like; Liu, Wei; Ren, Qinghua
2017-01-01
Flexible job-shop scheduling problem (FJSP) is an NP-hard puzzle which inherits the job-shop scheduling problem (JSP) characteristics. This paper presents a bee evolutionary guiding nondominated sorting genetic algorithm II (BEG-NSGA-II) for multiobjective FJSP (MO-FJSP) with the objectives to minimize the maximal completion time, the workload of the most loaded machine, and the total workload of all machines. It adopts a two-stage optimization mechanism during the optimizing process. In the first stage, the NSGA-II algorithm with T iteration times is first used to obtain the initial population N , in which a bee evolutionary guiding scheme is presented to exploit the solution space extensively. In the second stage, the NSGA-II algorithm with GEN iteration times is used again to obtain the Pareto-optimal solutions. In order to enhance the searching ability and avoid the premature convergence, an updating mechanism is employed in this stage. More specifically, its population consists of three parts, and each of them changes with the iteration times. What is more, numerical simulations are carried out which are based on some published benchmark instances. Finally, the effectiveness of the proposed BEG-NSGA-II algorithm is shown by comparing the experimental results and the results of some well-known algorithms already existed.
Deng, Qianwang; Gong, Xuran; Zhang, Like; Liu, Wei; Ren, Qinghua
2017-01-01
Flexible job-shop scheduling problem (FJSP) is an NP-hard puzzle which inherits the job-shop scheduling problem (JSP) characteristics. This paper presents a bee evolutionary guiding nondominated sorting genetic algorithm II (BEG-NSGA-II) for multiobjective FJSP (MO-FJSP) with the objectives to minimize the maximal completion time, the workload of the most loaded machine, and the total workload of all machines. It adopts a two-stage optimization mechanism during the optimizing process. In the first stage, the NSGA-II algorithm with T iteration times is first used to obtain the initial population N, in which a bee evolutionary guiding scheme is presented to exploit the solution space extensively. In the second stage, the NSGA-II algorithm with GEN iteration times is used again to obtain the Pareto-optimal solutions. In order to enhance the searching ability and avoid the premature convergence, an updating mechanism is employed in this stage. More specifically, its population consists of three parts, and each of them changes with the iteration times. What is more, numerical simulations are carried out which are based on some published benchmark instances. Finally, the effectiveness of the proposed BEG-NSGA-II algorithm is shown by comparing the experimental results and the results of some well-known algorithms already existed. PMID:28458687
Skipping Strategy (SS) for Initial Population of Job-Shop Scheduling Problem
NASA Astrophysics Data System (ADS)
Abdolrazzagh-Nezhad, M.; Nababan, E. B.; Sarim, H. M.
2018-03-01
Initial population in job-shop scheduling problem (JSSP) is an essential step to obtain near optimal solution. Techniques used to solve JSSP are computationally demanding. Skipping strategy (SS) is employed to acquire initial population after sequence of job on machine and sequence of operations (expressed in Plates-jobs and mPlates-jobs) are determined. The proposed technique is applied to benchmark datasets and the results are compared to that of other initialization techniques. It is shown that the initial population obtained from the SS approach could generate optimal solution.
Due-Window Assignment Scheduling with Variable Job Processing Times
Wu, Yu-Bin
2015-01-01
We consider a common due-window assignment scheduling problem jobs with variable job processing times on a single machine, where the processing time of a job is a function of its position in a sequence (i.e., learning effect) or its starting time (i.e., deteriorating effect). The problem is to determine the optimal due-windows, and the processing sequence simultaneously to minimize a cost function includes earliness, tardiness, the window location, window size, and weighted number of tardy jobs. We prove that the problem can be solved in polynomial time. PMID:25918745
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pautz, Shawn D.; Bailey, Teresa S.
Here, the efficiency of discrete ordinates transport sweeps depends on the scheduling algorithm, the domain decomposition, the problem to be solved, and the computational platform. Sweep scheduling algorithms may be categorized by their approach to several issues. In this paper we examine the strategy of domain overloading for mesh partitioning as one of the components of such algorithms. In particular, we extend the domain overloading strategy, previously defined and analyzed for structured meshes, to the general case of unstructured meshes. We also present computational results for both the structured and unstructured domain overloading cases. We find that an appropriate amountmore » of domain overloading can greatly improve the efficiency of parallel sweeps for both structured and unstructured partitionings of the test problems examined on up to 10 5 processor cores.« less