Parallel Extensions

Last updated

The .NET Framework stack DotNet.svg
The .NET Framework stack

Parallel Extensions was the development name for a managed concurrency library developed by a collaboration between Microsoft Research and the CLR team at Microsoft. The library was released in version 4.0 of the .NET Framework. [1] It is composed of two parts: Parallel LINQ (PLINQ) and Task Parallel Library (TPL). [2] [3] It also consists of a set of coordination data structures (CDS) – sets of data structures used to synchronize and co-ordinate the execution of concurrent tasks. [4]

Managed code is computer program code that requires and will execute only under the management of a Common Language Runtime (CLR) virtual machine, e.g. .NET Core, .NET Framework, or Mono. The term was coined by Microsoft.

Library (computing) collection of non-volatile resources used by computer programs, often to develop software; software component that can be reused by other software for a specific purpose

In computer science, a library is a collection of non-volatile resources used by computer programs, often for software development. These may include configuration data, documentation, help data, message templates, pre-written code and subroutines, classes, values or type specifications. In IBM's OS/360 and its successors they are referred to as partitioned data sets.

Microsoft Research is the research subsidiary of Microsoft. It was formed in 1991, with the intent to advance state-of-the-art computing and solve difficult world problems through technological innovation in collaboration with academic, government, and industry researchers. The Microsoft Research team employs more than 1,000 computer scientists, physicists, engineers, and mathematicians, including Turing Award winners, Fields Medal winners, MacArthur Fellows, and Dijkstra Prize winners.

Contents

Parallel LINQ

PLINQ, or Parallel LINQ , parallelizing the execution of queries on objects (LINQ to Objects) and XML data (LINQ to XML). PLINQ is intended for exposing data parallelism by use of queries. [2] Any computation on objects that has been implemented as queries can be parallelized by PLINQ. However, the objects need to implement the IParallelEnumerable interface, which is defined by PLINQ itself. Internally it uses TPL for execution. [4] [5]

Language Integrated Query is a Microsoft .NET Framework component that adds native data querying capabilities to .NET languages, originally released as a major part of .NET Framework 3.5 in 2007.

Data parallelism

Data parallelism is parallelization across multiple processors in parallel computing environments. It focuses on distributing the data across different nodes, which operate on the data in parallel. It can be applied on regular data structures like arrays and matrices by working on each element in parallel. It contrasts to task parallelism as another form of parallelism.

Task Parallel Library

The Task Parallel Library (TPL) is the task parallelism component of the Parallel Extensions to .NET. [6] It exposes parallel constructs like parallel For and ForEach loops, using regular method calls and delegates, thus the constructs can be used from any CLI languages. The job of spawning and terminating threads, as well as scaling the number of threads according to the number of available processors, is done by the library itself, [3] using a work stealing scheduler. [7]

Task parallelism is a form of parallelization of computer code across multiple processors in parallel computing environments. Task parallelism focuses on distributing tasks—concurrently performed by processes or threads—across different processors. In contrast to data parallelism which involves running the same task on different components of data, task parallelism is distinguished by running many different tasks at the same time on the same data. A common type of task parallelism is pipelining which consists of moving a single set of data through a series of separate tasks where each task can execute independently of the others.

A delegate is a form of type-safe function pointer used by the Common Language Infrastructure (CLI). Delegates specify a method to call and optionally an object to call the method on. Delegates are used, among other things, to implement callbacks and event listeners. A delegate object encapsulates a reference to a method. The delegate object can then be passed to code that can call the referenced method, without having to know at compile time which method will be invoked.

Thread (computing) smallest sequence of programmed instructions that can be managed independently by a scheduler

In computer science, a thread of execution is the smallest sequence of programmed instructions that can be managed independently by a scheduler, which is typically a part of the operating system. The implementation of threads and processes differs between operating systems, but in most cases a thread is a component of a process. Multiple threads can exist within one process, executing concurrently and sharing resources such as memory, while different processes do not share these resources. In particular, the threads of a process share its executable code and the values of its dynamically allocated variables and non-thread-local global variables at any given time.

TPL also includes other constructs like Task and Future . A Task is an action that can be executed independent of the rest of the program. In that sense, it is semantically equivalent to a thread, except that it is a more light-weight object and comes without the overhead of creating an OS thread. Tasks are queued by a Task Manager object and are scheduled to run on multiple OS threads in a thread pool when their turn comes.

Future is a task that returns a result. The result is computed in a background thread encapsulated by the Future object, and the result is buffered until it is retrieved. [3] If an attempt is made to retrieve the result before it has been computed then the requesting thread will block until the result is available. [6]

The other construct of TPL is Parallel class. TPL provides a basic form of structured parallelism via three static methods in the Parallel class:

Parallel.Invoke
Executes an array of Action delegates in parallel, and then waits for them to complete
Parallel.For
Parallel equivalent of a C# for loop
Parallel.ForEach
Parallel equivalent of a C# foreach loop

Architecture

The main concept in the Parallel Extensions to .NET is a Task, which is a small unit of code, usually represented as a lambda function, that can be executed independently. Both PLINQ and the TPL API provides methods to create the Tasks - PLINQ divides a query into smaller Tasks, and the Parallel.For, Parallel.ForEach and Parallel.Invoke methods divide a loop into Tasks.

PFX includes a Task Manager object which schedules the Tasks for execution. A Task Manager contains a global queue of Tasks, which are then executed. It also encapsulates multiple threads onto which the Tasks are executed. By default, as many threads as there are processors (or processor cores) on the system are created, though this number may be manually modified. Each thread is associated with a thread-specific queue of Tasks. When idle, each thread picks up a batch of Tasks and puts them on its local queue, where they are then executed, one by one. If the global queue is empty, a thread will look for Tasks in the queues of its peers, and will take the Tasks which have been in the queue the longest (task stealing). When in execution, the Tasks will be executed independently, with the change in state of one Task independent of others. As a result, if they use a shared resource, they still need to be synchronized manually using locks or other constructs.

See also

Related Research Articles

OpenMP open standard for parallelizing

OpenMP is an application programming interface (API) that supports multi-platform shared memory multiprocessing programming in C, C++, and Fortran, on most platforms, instruction set architectures and operating systems, including Solaris, AIX, HP-UX, Linux, macOS, and Windows. It consists of a set of compiler directives, library routines, and environment variables that influence run-time behavior.

Thread pool

In computer programming, a thread pool is a software design pattern for achieving concurrency of execution in a computer program. Often also called a replicated workers or worker-crew model, a thread pool maintains multiple threads waiting for tasks to be allocated for concurrent execution by the supervising program. By maintaining a pool of threads, the model increases performance and avoids latency in execution due to frequent creation and destruction of threads for short-lived tasks. The number of available threads is tuned to the computing resources available to the program, such as parallel processors, cores, memory, and network sockets.

Cilk, Cilk++ and Cilk Plus are general-purpose programming languages designed for multithreaded parallel computing. They are based on the C and C++ programming languages, which they extend with constructs to express parallel loops and the fork–join idiom.

Task (computing) computing term; execution path through address space

In computing, a task is a unit of execution or a unit of work. The term is ambiguous; precise alternative terms include process, light-weight process, thread, step, request, or query. In the adjacent diagram, there are queues of incoming work to do and outgoing completed work, and a thread pool of threads to perform this work. Either the work units themselves or the threads that perform the work can be referred to as "tasks", and these can be referred to respectively as requests/responses/threads, incoming tasks/completed tasks/threads, or requests/responses/tasks.

A runtime system, also called run-time system, runtime environment or run-time environment, primarily implements portions of an execution model. This is not to be confused with the runtime lifecycle phase of a program, during which the runtime system is in operation. Most languages have some form of runtime system that provides an environment in which programs run. This environment may address a number of issues including the layout of application memory, how the program accesses variables, mechanisms for passing parameters between procedures, interfacing with the operating system, and otherwise. The compiler makes assumptions depending on the specific runtime system to generate correct code. Typically the runtime system will have some responsibility for setting up and managing the stack and heap, and may include features such as garbage collection, threads or other dynamic features built into the language.

In computer science, future, promise, delay, and deferred refer to constructs used for synchronizing program execution in some concurrent programming languages. They describe an object that acts as a proxy for a result that is initially unknown, usually because the computation of its value is not yet complete.

Automatic parallelization, also auto parallelization, autoparallelization, or parallelization, the last one of which implies automation when used in context, refers to converting sequential code into multi-threaded or vectorized code in order to utilize multiple processors simultaneously in a shared-memory multiprocessor (SMP) machine. The goal of automatic parallelization is to relieve programmers from the hectic and error-prone manual parallelization process. Though the quality of automatic parallelization has improved in the past several decades, fully automatic parallelization of sequential programs by compilers remains a grand challenge due to its need for complex program analysis and the unknown factors during compilation.

Concurrent computing is a form of computing in which several computations are executed during overlapping time periods—concurrently—instead of sequentially. This is a property of a system—this may be an individual program, a computer, or a network—and there is a separate execution point or "thread of control" for each computation ("process"). A concurrent system is one where a computation can advance without waiting for all other computations to complete.

Concurrency and Coordination Runtime (CCR) is an asynchronous programming library based on .NET Framework from Microsoft distributed with Microsoft Robotics Developer Studio (MRDS). Even though it comes with MRDS, it is not limited to modelling robotic behavior but can be used to express asynchronous behavior in any application.

Threading Building Blocks (TBB) is a C++ template library developed by Intel for parallel programming on multi-core processors. Using TBB, a computation is broken down into tasks that can run in parallel. The library manages and schedules threads to execute these tasks.

Grand Central Dispatch (GCD) is a technology developed by Apple Inc. to optimize application support for systems with multi-core processors and other symmetric multiprocessing systems. It is an implementation of task parallelism based on the thread pool pattern. The fundamental idea is to move the management of the thread pool out of the hands of the developer, and closer to the operating system. The developer injects "work packages" into the pool oblivious of the pool's architecture. This model improves simplicity, portability and performance.

High performance computing applications run on massively parallel supercomputers consist of concurrent programs designed using multi-threaded, multi-process models. The applications may consist of various constructs with varying degree of parallelism. Although high performance concurrent programs use similar design patterns, models and principles as that of sequential programs, unlike sequential programs, they typically demonstrate non-deterministic behavior. The probability of bugs increases with the number of interactions between the various parallel constructs. Race conditions, data races, deadlocks, missed signals and live lock are common error types.

In parallel computing, work stealing is a scheduling strategy for multithreaded computer programs. It solves the problem of executing a dynamically multithreaded computation, one that can "spawn" new threads of execution, on a statically multithreaded computer, with a fixed number of processors. It does so efficiently both in terms of execution time, memory usage, and inter-processor communication.

Fork–join model way of designing and executing parallel computations

In parallel computing, the fork–join model is a way of setting up and executing parallel programs, such that execution branches off in parallel at designated points in the program, to "join" (merge) at a subsequent point and resume sequential execution. Parallel sections may fork recursively until a certain task granularity is reached. Fork–join can be considered a parallel design pattern. It was formulated as early as 1963.

Asynchrony, in computer programming, refers to the occurrence of events independent of the main program flow and ways to deal with such events. These may be "outside" events such as the arrival of signals, or actions instigated by a program that take place concurrently with program execution, without the program blocking to wait for results. Asynchronous input/output is an example of the latter cause of asynchrony, and lets programs issue commands to storage or network devices that service these requests while the processor continues executing the program. Doing so provides a degree of parallelism.

DOPIPE parallelism is a method to perform loop-level parallelism by pipelining the statements in a loop. Pipelined parallelism may exist at different levels of abstraction like loops, functions and algorithmic stages. The extent of parallelism depends upon the programmers' ability to make best use of this concept. It also depends upon factors like identifying and separating the independent tasks and executing them parallelly.

References

  1. "What's New in the .NET Framework 4" . Retrieved 21 September 2011.
  2. 1 2 "Programming in the Age of Concurrency: Concurrent Programming with PFX" . Retrieved 16 October 2007.
  3. 1 2 3 "MSDN Magazine: Task Parallel Library" . Retrieved 16 October 2007.
  4. 1 2 "June 2008 CTP - Parallel Extensions to the .NET FX" . Retrieved 6 August 2008.
  5. "More powerful aggregations in PLINQ" . Retrieved 6 August 2008.
  6. 1 2 Duffy, Joe (2009). Concurrent Programming on Windows. pp. "887–929". ISBN   978-0321434821.
  7. Leijen, Daan; Schulte, Wolfram; Burckhardt, Sebastian (2009). "The Design of a Task Parallel Library". ACM SIGPLAN Notices. 44 (10): 227. CiteSeerX   10.1.1.146.4197 . doi:10.1145/1639949.1640106.