Parallel Extensions

Last updated

Parallel Extensions was the development name for a managed concurrency library developed by a collaboration between Microsoft Research and the CLR team at Microsoft. The library was released in version 4.0 of the .NET Framework. [1] It is composed of two parts: Parallel LINQ (PLINQ) and Task Parallel Library (TPL). [2] [3] It also consists of a set of coordination data structures (CDS) – sets of data structures used to synchronize and co-ordinate the execution of concurrent tasks. [4]

Contents

Parallel LINQ

PLINQ, or Parallel LINQ , parallelizing the execution of queries on objects (LINQ to Objects) and XML data (LINQ to XML). PLINQ is intended for exposing data parallelism by use of queries. [2] Any computation on objects that has been implemented as queries can be parallelized by PLINQ. However, the objects need to implement the IParallelEnumerable interface, which is defined by PLINQ itself. Internally it uses TPL for execution. [4] [5]

Task Parallel Library

The Task Parallel Library (TPL) is the task parallelism component of the Parallel Extensions to .NET. [6] It exposes parallel constructs like parallel For and ForEach loops, using regular method calls and delegates, thus the constructs can be used from any CLI languages. The job of spawning and terminating threads, as well as scaling the number of threads according to the number of available processors, is done by the library itself, [3] using a work stealing scheduler. [7]

TPL also includes other constructs like Task and Future . A Task is an action that can be executed independent of the rest of the program. In that sense, it is semantically equivalent to a thread, except that it is a more light-weight object and comes without the overhead of creating an OS thread. Tasks are queued by a Task Manager object and are scheduled to run on multiple OS threads in a thread pool when their turn comes.

Future is a task that returns a result. The result is computed in a background thread encapsulated by the Future object, and the result is buffered until it is retrieved. [3] If an attempt is made to retrieve the result before it has been computed then the requesting thread will block until the result is available. [6]

The other construct of TPL is Parallel class. TPL provides a basic form of structured parallelism via three static methods in the Parallel class:

Parallel.Invoke
Executes an array of Action delegates in parallel, and then waits for them to complete
Parallel.For
Parallel equivalent of a C# for loop
Parallel.ForEach
Parallel equivalent of a C# foreach loop

Architecture

The main concept in the Parallel Extensions to .NET is a Task, which is a small unit of code, usually represented as a lambda function, that can be executed independently. Both PLINQ and the TPL API provides methods to create the Tasks – PLINQ divides a query into smaller Tasks, and the Parallel.For, Parallel.ForEach and Parallel.Invoke methods divide a loop into Tasks.

PFX includes a Task Manager object which schedules the Tasks for execution. A Task Manager contains a global queue of Tasks, which are then executed. It also encapsulates multiple threads onto which the Tasks are executed. By default, as many threads as there are processors (or processor cores) on the system are created, though this number may be manually modified. Each thread is associated with a thread-specific queue of Tasks. When idle, each thread picks up a batch of Tasks and puts them on its local queue, where they are then executed, one by one. If the global queue is empty, a thread will look for Tasks in the queues of its peers, and will take the Tasks which have been in the queue the longest (task stealing). When in execution, the Tasks will be executed independently, with the change in state of one Task independent of others. As a result, if they use a shared resource, they still need to be synchronized manually using locks or other constructs.

See also

Related Research Articles

<span class="mw-page-title-main">Thread (computing)</span> Smallest sequence of programmed instructions that can be managed independently by a scheduler

In computer science, a thread of execution is the smallest sequence of programmed instructions that can be managed independently by a scheduler, which is typically a part of the operating system. The implementation of threads and processes differs between operating systems. In Modern Operating Systems, Tanenbaum shows that many distinct models of process organization are possible. In many cases, a thread is a component of a process. The multiple threads of a given process may be executed concurrently, sharing resources such as memory, while different processes do not share these resources. In particular, the threads of a process share its executable code and the values of its dynamically allocated variables and non-thread-local global variables at any given time.

<span class="mw-page-title-main">Parallel computing</span> Programming paradigm in which many processes are executed simultaneously

Parallel computing is a type of computation in which many calculations or processes are carried out simultaneously. Large problems can often be divided into smaller ones, which can then be solved at the same time. There are several different forms of parallel computing: bit-level, instruction-level, data, and task parallelism. Parallelism has long been employed in high-performance computing, but has gained broader interest due to the physical constraints preventing frequency scaling. As power consumption by computers has become a concern in recent years, parallel computing has become the dominant paradigm in computer architecture, mainly in the form of multi-core processors.

<span class="mw-page-title-main">OpenMP</span> Open standard for parallelizing

OpenMP is an application programming interface (API) that supports multi-platform shared-memory multiprocessing programming in C, C++, and Fortran, on many platforms, instruction-set architectures and operating systems, including Solaris, AIX, FreeBSD, HP-UX, Linux, macOS, and Windows. It consists of a set of compiler directives, library routines, and environment variables that influence run-time behavior.

<span class="mw-page-title-main">Thread pool</span> Software design pattern

In computer programming, a thread pool is a software design pattern for achieving concurrency of execution in a computer program. Often also called a replicated workers or worker-crew model, a thread pool maintains multiple threads waiting for tasks to be allocated for concurrent execution by the supervising program. By maintaining a pool of threads, the model increases performance and avoids latency in execution due to frequent creation and destruction of threads for short-lived tasks. The number of available threads is tuned to the computing resources available to the program, such as a parallel task queue after completion of execution.

Cilk, Cilk++, Cilk Plus and OpenCilk are general-purpose programming languages designed for multithreaded parallel computing. They are based on the C and C++ programming languages, which they extend with constructs to express parallel loops and the fork–join idiom.

<span class="mw-page-title-main">Task (computing)</span> Unit of execution or work in software

In computing, a task is a unit of execution or a unit of work. The term is ambiguous; precise alternative terms include process, light-weight process, thread, step, request, or query. In the adjacent diagram, there are queues of incoming work to do and outgoing completed work, and a thread pool of threads to perform this work. Either the work units themselves or the threads that perform the work can be referred to as "tasks", and these can be referred to respectively as requests/responses/threads, incoming tasks/completed tasks/threads, or requests/responses/tasks.

In computer programming, a runtime system or runtime environment is a sub-system that exists both in the computer where a program is created, as well as in the computers where the program is intended to be run. The name comes from the compile time and runtime division from compiled languages, which similarly distinguishes the computer processes involved in the creation of a program (compilation) and its execution in the target machine.

In computer science, future, promise, delay, and deferred refer to constructs used for synchronizing program execution in some concurrent programming languages. They describe an object that acts as a proxy for a result that is initially unknown, usually because the computation of its value is not yet complete.

Automatic parallelization, also auto parallelization, or autoparallelization refers to converting sequential code into multi-threaded and/or vectorized code in order to use multiple processors simultaneously in a shared-memory multiprocessor (SMP) machine. Fully automatic parallelization of sequential programs is a challenge because it requires complex program analysis and the best approach may depend upon parameter values that are not known at compilation time.

Concurrent computing is a form of computing in which several computations are executed concurrently—during overlapping time periods—instead of sequentially—with one completing before the next starts.

<span class="mw-page-title-main">Binary Modular Dataflow Machine</span>

Binary Modular Dataflow Machine (BMDFM) is a software package that enables running an application in parallel on shared memory symmetric multiprocessing (SMP) computers using the multiple processors to speed up the execution of single applications. BMDFM automatically identifies and exploits parallelism due to the static and mainly dynamic scheduling of the dataflow instruction sequences derived from the formerly sequential program.

Task parallelism is a form of parallelization of computer code across multiple processors in parallel computing environments. Task parallelism focuses on distributing tasks—concurrently performed by processes or threads—across different processors. In contrast to data parallelism which involves running the same task on different components of data, task parallelism is distinguished by running many different tasks at the same time on the same data. A common type of task parallelism is pipelining, which consists of moving a single set of data through a series of separate tasks where each task can execute independently of the others.

Concurrency and Coordination Runtime (CCR) is an asynchronous programming library based on .NET Framework from Microsoft distributed with Microsoft Robotics Developer Studio (MRDS). Even though it comes with MRDS, it is not limited to modelling robotic behavior but can be used to express asynchronous behavior in any application.

oneAPI Threading Building Blocks, is a C++ template library developed by Intel for parallel programming on multi-core processors. Using TBB, a computation is broken down into tasks that can run in parallel. The library manages and schedules threads to execute these tasks.

Language Integrated Query is a Microsoft .NET Framework component that adds native data querying capabilities to .NET languages, originally released as a major part of .NET Framework 3.5 in 2007.

Grand Central Dispatch, is a technology developed by Apple Inc. to optimize application support for systems with multi-core processors and other symmetric multiprocessing systems. It is an implementation of task parallelism based on the thread pool pattern. The fundamental idea is to move the management of the thread pool out of the hands of the developer, and closer to the operating system. The developer injects "work packages" into the pool oblivious of the pool's architecture. This model improves simplicity, portability and performance.

In parallel computing, work stealing is a scheduling strategy for multithreaded computer programs. It solves the problem of executing a dynamically multithreaded computation, one that can "spawn" new threads of execution, on a statically multithreaded computer, with a fixed number of processors. It does so efficiently in terms of execution time, memory usage, and inter-processor communication.

<span class="mw-page-title-main">Fork–join model</span> Way of setting up and executing parallel computer programs

In parallel computing, the fork–join model is a way of setting up and executing parallel programs, such that execution branches off in parallel at designated points in the program, to "join" (merge) at a subsequent point and resume sequential execution. Parallel sections may fork recursively until a certain task granularity is reached. Fork–join can be considered a parallel design pattern. It was formulated as early as 1963.

In computer programming, virtual threads are threads that are scheduled by a runtime library instead of natively by the underlying operating system (OS). Virtual threads allows for tens of millions of preemptive tasks and events without swapping on a 2021 consumer-grade computer.., compared to low thousands of operating system threads. Preemptive execution is important to performance gains through parallelism and fast preemptive response times for tens of millions of events. Earlier constructs that are not preemptive, such as coroutines or the largely single-threaded Node.js, introduce delays in responding to asynchronous events such as every incoming request in a server application

References

  1. "What's New in the .NET Framework 4" . Retrieved 21 September 2011.
  2. 1 2 "Programming in the Age of Concurrency: Concurrent Programming with PFX" . Retrieved 16 October 2007.
  3. 1 2 3 "MSDN Magazine: Task Parallel Library". Archived from the original on 14 October 2007. Retrieved 16 October 2007.
  4. 1 2 "June 2008 CTP – Parallel Extensions to the .NET FX" . Retrieved 6 August 2008.
  5. "More powerful aggregations in PLINQ" . Retrieved 6 August 2008.
  6. 1 2 Duffy, Joe (2009). Concurrent Programming on Windows. pp. 887–929. ISBN   978-0321434821.
  7. Leijen, Daan; Schulte, Wolfram; Burckhardt, Sebastian (2009). "The Design of a Task Parallel Library". ACM SIGPLAN Notices. 44 (10): 227. CiteSeerX   10.1.1.146.4197 . doi:10.1145/1639949.1640106.