Coscheduling

Last updated September 19, 2021

Coscheduling is the principle for concurrent systems of scheduling related processes to run on different processors at the same time (in parallel). There are various specific implementations to realize this.

If an application consists of a collection of processes working closely together, and if some but not all of the processes are scheduled for execution, the executing processes may attempt to communicate with those that are not executing, which will cause them to block. Eventually the other processes will be scheduled for execution, but by this time the situation may be reversed so that these processes also block waiting for interactions with others. As a result, the application makes progress at the rate of at most one interprocess interaction per time slice, and will have low throughput and high latency.

Implementation

Coscheduling consists of two ideas:

When scheduling any of the processes in the related group, schedule all of them for execution so that they can communicate efficiently.
When a process in the group blocks while communicating with another process in the group, don't remove it from its processor. Instead, leave its state loaded on its processor for a short time, under the assumption that it will receive a response shortly. If this time elapses and the process still hasn't become runnable, then assume it will sleep for a long time and reschedule the processor.

Some coscheduling techniques exhibit fragments of processes that do not run concurrently with the rest of the coscheduled set. The occurrence of these fragments is usually minimized by these algorithms. Gang scheduling is a stricter variant of coscheduling that disallows fragments completely.

Types of coscheduling

Researchers have classified three types of coscheduling: explicit coscheduling, local scheduling and implicit or dynamic coscheduling.^[1]

Explicit coscheduling requires all processing to actually take place at the same time, and is typically implemented by global scheduling across all processors. A specific algorithm is known as gang scheduling.

Local coscheduling allows individual processors to schedule the processing independently.

Dynamic (or implicit) coscheduling is a form of coscheduling where individual processors can still schedule processing independently, but they make scheduling decisions in cooperation with other processors.

History

The term "coscheduling" was introduced by Ousterhout (1982). The original definition is that the process working set must be coscheduled (scheduled for execution simultaneously) for the parallel program to make progress.

Notes

↑ Fabrizio Petrini, Wu-chun Feng. Improved Resource Utilization with Buffered Coscheduling, Journal of Parallel Algorithms and Applications, 2000

Ousterhout, J. K. (1982). "Scheduling Techniques for Concurrent Systems" (PDF). Proceedings of Third International Conference on Distributed Computing Systems: 22–30.

Related Research Articles

In computing, a process is the instance of a computer program that is being executed by one or many threads. It contains the program code and its activity. Depending on the operating system (OS), a process may be made up of multiple threads of execution that execute instructions concurrently.

In computer science, a thread of execution is the smallest sequence of programmed instructions that can be managed independently by a scheduler, which is typically a part of the operating system. The implementation of threads and processes differs between operating systems, but in most cases a thread is a component of a process. The multiple threads of a given process may be executed concurrently, sharing resources such as memory, while different processes do not share these resources. In particular, the threads of a process share its executable code and the values of its dynamically allocated variables and non-thread-local global variables at any given time.

Superscalar processor CPU that implements instruction-level parallelism within a single processor

A superscalar processor is a CPU that implements a form of parallelism called instruction-level parallelism within a single processor. In contrast to a scalar processor that can execute at most one single instruction per clock cycle, a superscalar processor can execute more than one instruction during a clock cycle by simultaneously dispatching multiple instructions to different execution units on the processor. It therefore allows for more throughput than would otherwise be possible at a given clock rate. Each execution unit is not a separate processor, but an execution resource within a single CPU such as an arithmetic logic unit.

Parallel computing is a type of computation in which many calculations or processes are carried out simultaneously. Large problems can often be divided into smaller ones, which can then be solved at the same time. There are several different forms of parallel computing: bit-level, instruction-level, data, and task parallelism. Parallelism has long been employed in high-performance computing, but has gained broader interest due to the physical constraints preventing frequency scaling. As power consumption by computers has become a concern in recent years, parallel computing has become the dominant paradigm in computer architecture, mainly in the form of multi-core processors.

In computer, scheduling is the action of assigning resources to perform tasks. The resources may be processors, network links or expansion cards. The tasks may be threads, processes or data flows.

Instruction-level parallelism (ILP) is the parallel or simultaneous execution of a sequence of instructions in a computer program. More specifically ILP refers to the average number of instructions run per step of this parallel execution.

Execution in computer and software engineering is the process by which a computer or virtual machine reads and acts on the instructions of a computer program. Each instruction of a program is a description of a particular action which must be carried out, in order for a specific problem to be solved. Execution involves repeatedly following a 'fetch–decode–execute' cycle for each instruction. As the executing machine follows the instructions, specific effects are produced in accordance with the semantics of those instructions.

In computer science, an algorithm is called non-blocking if failure or suspension of any thread cannot cause failure or suspension of another thread; for some operations, these algorithms provide a useful alternative to traditional blocking implementations. A non-blocking algorithm is lock-free if there is guaranteed system-wide progress, and wait-free if there is also guaranteed per-thread progress. "Non-blocking" was used as a synonym for "lock-free" in the literature until the introduction of obstruction-freedom in 2003.

In concurrent programming, concurrent accesses to shared resources can lead to unexpected or erroneous behavior, so parts of the program where the shared resource is accessed need to be protected in ways that avoid the concurrent access. This protected section is the critical section or critical region. It cannot be executed by more than one process at a time. Typically, the critical section accesses a shared resource, such as a data structure, a peripheral device, or a network connection, that would not operate correctly in the context of multiple concurrent accesses.

In computer science, thrashing occurs when a computer's virtual memory resources are overused, leading to a constant state of paging and page faults, inhibiting most application-level processing. This causes the performance of the computer to degrade or collapse. The situation can continue indefinitely until either the user closes some running applications or the active processes free up additional virtual memory resources.

In computer science, concurrency is the ability of different parts or units of a program, algorithm, or problem to be executed out-of-order or at the same time simultaneously partial order, without affecting the final outcome. This allows for parallel execution of the concurrent units, which can significantly improve overall speed of the execution in multi-processor and multi-core systems. In more technical terms, concurrency refers to the decomposability of a program, algorithm, or problem into order-independent or partially-ordered components or units of computation.

Join Java is a programming language based on the join-pattern that extends the standard Java programming language with the join semantics of the join-calculus. It was written at the University of South Australia within the Reconfigurable Computing Lab by Dr. Von Itzstein.

In computer science, gang scheduling is a scheduling algorithm for parallel systems that schedules related threads or processes to run simultaneously on different processors. Usually these will be threads all belonging to the same process, but they may also be from different processes, where the processes could have a producer-consumer relationship or come from the same MPI program.

In computing, a parallel programming model is an abstraction of parallel computer architecture, with which it is convenient to express algorithms and their composition in programs. The value of a programming model can be judged on its generality: how well a range of different problems can be expressed for a variety of different architectures, and its performance: how efficiently the compiled programs can execute. The implementation of a parallel programming model can take the form of a library invoked from a sequential language, as an extension to an existing language, or as an entirely new language.

Concurrent computing is a form of computing in which several computations are executed concurrently—during overlapping time periods—instead of sequentially—with one completing before the next starts.

Binary Modular Dataflow Machine (BMDFM) is software that enables running an application in parallel on shared memory symmetric multiprocessing (SMP) computers using the multiple processors to speed up the execution of single applications. BMDFM automatically identifies and exploits parallelism due to the static and mainly dynamic scheduling of the dataflow instruction sequences derived from the formerly sequential program.

Working set is a concept in computer science which defines the amount of memory that a process requires in a given time interval.

In computer storage, fragmentation is a phenomenon in which storage space, main storage or secondary storage, is used inefficiently, reducing capacity or performance and often both. The exact consequences of fragmentation depend on the specific system of storage allocation in use and the particular form of fragmentation. In many cases, fragmentation leads to storage space being "wasted", and in that case the term also refers to the wasted space itself.

In computer architecture, multithreading is the ability of a central processing unit (CPU) to provide multiple threads of execution concurrently, supported by the operating system. This approach differs from multiprocessing. In a multithreaded application, the threads share the resources of a single or multiple cores, which include the computing units, the CPU caches, and the translation lookaside buffer (TLB).

High performance computing applications run on massively parallel supercomputers consist of concurrent programs designed using multi-threaded, multi-process models. The applications may consist of various constructs with varying degree of parallelism. Although high performance concurrent programs use similar design patterns, models and principles as that of sequential programs, unlike sequential programs, they typically demonstrate non-deterministic behavior. The probability of bugs increases with the number of interactions between the various parallel constructs. Race conditions, data races, deadlocks, missed signals and live lock are common error types.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Fabrizio Petrini, Wu-chun Feng. Improved Resource Utilization with Buffered Coscheduling, Journal of Parallel Algorithms and Applications, 2000

[1]