Threading Building Blocks

Last updated

Contents

Threading Building Blocks
Developer(s) Intel
Stable release
2021.4 / October 4, 2021;42 days ago (2021-10-04) [1]
Repository
Written in C++
Operating system FreeBSD, Linux, Solaris, macOS, Windows, Android
Type library or framework
License dual: commercial / open source (Apache 2.0), plus Freeware [2]
Website github.com/oneapi-src/oneTBB
intel.com/oneTBB

oneAPI Threading Building Blocks (oneTBB), formerly Threading Building Blocks (TBB), is a C++ template library developed by Intel for parallel programming on multi-core processors. Using TBB, a computation is broken down into tasks that can run in parallel. The library manages and schedules threads to execute these tasks.

Overview

A oneTBB program creates, synchronizes, and destroys graphs of dependent tasks according to algorithms, i.e. high-level parallel programming paradigms (a.k.a. Algorithmic Skeletons). Tasks are then executed respecting graph dependencies. This approach groups TBB in a family of techniques for parallel programming aiming to decouple the programming from the particulars of the underlying machine.

oneTBB implements work stealing to balance a parallel workload across available processing cores in order to increase core utilization and therefore scaling. Initially, the workload is evenly divided among the available processor cores. If one core completes its work while other cores still have a significant amount of work in their queue, oneTBB reassigns some of the work from one of the busy cores to the idle core. This dynamic capability decouples the programmer from the machine, allowing applications written using the library to scale to utilize the available processing cores with no changes to the source code or the executable program file. In a 2008 assessment of the work stealing implementation in TBB, researchers from Princeton University found that it was suboptimal for large numbers of processors cores, causing up to 47% of computing time spent in scheduling overhead when running certain benchmarks on a 32-core system. [3]

oneTBB, like the STL (and the part of the C++ standard library based on it), uses templates extensively. This has the advantage of low-overhead polymorphism, since templates are a compile-time construct which modern C++ compilers can largely optimize away.

oneTBB is available commercially as a binary distribution with support, [4] and as open-source software in both source and binary forms.

oneTBB does not provide guarantees of determinism or freedom from data races. [5]

Library contents

oneTBB is a collection of components for parallel programming:

Systems supported

The hardware, operating system, and software prerequisites for oneTBB.

Supported Hardware

Supported Operating Systems

Systems with Microsoft Windows operating systems:

Systems with Linux* operating systems:

Systems with macOS operating systems:

Systems with Android operating systems:

Supported Compilers


See also

Notes

  1. "oneAPI Threading Building Blocks Github Releases".
  2. "No Cost Options for Intel Support yourself, Royalty-Free".
  3. Contreras, Gilberto; Martonosi, Margaret (2008). Characterizing and improving the performance of Intel Threading Building Blocks (PDF). IEEE Int'l Symp. on Workload Characterization.
  4. https://software.intel.com/en-us/intel-tbb Intel Threading Building Blocks Commercial Version Homepage
  5. Bocchino Jr., Robert L.; Adve, Vikram S.; Adve, Sarita V.; Snir, Marc (2009). Parallel Programming Must Be Deterministic by Default. USENIX Workshop on Hot Topics in Parallelism.

Related Research Articles

Thread (computing) Smallest sequence of programmed instructions that can be managed independently by a scheduler

In computer science, a thread of execution is the smallest sequence of programmed instructions that can be managed independently by a scheduler, which is typically a part of the operating system. The implementation of threads and processes differs between operating systems, but in most cases a thread is a component of a process. The multiple threads of a given process may be executed concurrently, sharing resources such as memory, while different processes do not share these resources. In particular, the threads of a process share its executable code and the values of its dynamically allocated variables and non-thread-local global variables at any given time.

SIMD

Single instruction, multiple data (SIMD) is a type of parallel processing in Flynn's taxonomy. SIMD can be internal and it can be directly accessible through an instruction set architecture (ISA): it should not be confused with an ISA. SIMD describes computers with multiple processing elements that perform the same operation on multiple data points simultaneously.

Parallel computing Programming paradigm in which many processes are executed simultaneously

Parallel computing is a type of computation in which many calculations or processes are carried out simultaneously. Large problems can often be divided into smaller ones, which can then be solved at the same time. There are several different forms of parallel computing: bit-level, instruction-level, data, and task parallelism. Parallelism has long been employed in high-performance computing, but has gained broader interest due to the physical constraints preventing frequency scaling. As power consumption by computers has become a concern in recent years, parallel computing has become the dominant paradigm in computer architecture, mainly in the form of multi-core processors.

In computer science, a lock or mutex is a synchronization primitive: a mechanism that enforces limits on access to a resource when there are many threads of execution. A lock is designed to enforce a mutual exclusion concurrency control policy, and with a variety of possible methods there exists multiple unique implementations for different applications.

OpenMP Open standard for parallelizing

OpenMP is an application programming interface (API) that supports multi-platform shared-memory multiprocessing programming in C, C++, and Fortran, on many platforms, instruction-set architectures and operating systems, including Solaris, AIX, FreeBSD, HP-UX, Linux, macOS, and Windows. It consists of a set of compiler directives, library routines, and environment variables that influence run-time behavior.

In computer science, an algorithm is called non-blocking if failure or suspension of any thread cannot cause failure or suspension of another thread; for some operations, these algorithms provide a useful alternative to traditional blocking implementations. A non-blocking algorithm is lock-free if there is guaranteed system-wide progress, and wait-free if there is also guaranteed per-thread progress. "Non-blocking" was used as a synonym for "lock-free" in the literature until the introduction of obstruction-freedom in 2003.

In concurrent programming, a monitor is a synchronization construct that allows threads to have both mutual exclusion and the ability to wait (block) for a certain condition to become false. Monitors also have a mechanism for signaling other threads that their condition has been met. A monitor consists of a mutex (lock) object and condition variables. A condition variable essentially is a container of threads that are waiting for a certain condition. Monitors provide a mechanism for threads to temporarily give up exclusive access in order to wait for some condition to be met, before regaining exclusive access and resuming their task.

Multi-core processor Microprocessor with more than one processing unit

A multi-core processor is a computer processor on a single integrated circuit with two or more separate processing units, called cores, each of which reads and executes program instructions. The instructions are ordinary CPU instructions but the single processor can run instructions on separate cores at the same time, increasing overall speed for programs that support multithreading or other parallel computing techniques. Manufacturers typically integrate the cores onto a single integrated circuit die or onto multiple dies in a single chip package. The microprocessors currently used in almost all personal computers are multi-core.

Intel C++ Compiler Classic and Intel oneAPI DPC++/C++ Compiler are Intel’s C, C++, SYCL, and Data Parallel C++ (DPC++) compilers for Intel processor-based systems, available for Microsoft Windows, Linux, and macOS operating systems.

Task parallelism is a form of parallelization of computer code across multiple processors in parallel computing environments. Task parallelism focuses on distributing tasks—concurrently performed by processes or threads—across different processors. In contrast to data parallelism which involves running the same task on different components of data, task parallelism is distinguished by running many different tasks at the same time on the same data. A common type of task parallelism is pipelining which consists of moving a single set of data through a series of separate tasks where each task can execute independently of the others.

Parallel Extensions

Parallel Extensions was the development name for a managed concurrency library developed by a collaboration between Microsoft Research and the CLR team at Microsoft. The library was released in version 4.0 of the .NET Framework. It is composed of two parts: Parallel LINQ (PLINQ) and Task Parallel Library (TPL). It also consists of a set of coordination data structures (CDS) – sets of data structures used to synchronize and co-ordinate the execution of concurrent tasks.

Grand Central Dispatch, is a technology developed by Apple Inc. to optimize application support for systems with multi-core processors and other symmetric multiprocessing systems. It is an implementation of task parallelism based on the thread pool pattern. The fundamental idea is to move the management of the thread pool out of the hands of the developer, and closer to the operating system. The developer injects "work packages" into the pool oblivious of the pool's architecture. This model improves simplicity, portability and performance.

Intel Parallel Studio XE was a software development product developed by Intel that facilitated native code development on Windows, macOS and Linux in C++ and Fortran for parallel computing. Parallel programming enables software programs to take advantage of multi-core processors from Intel and other processor vendors.

Concurrent Collections is a programming model for software frameworks to expose parallelism in applications. The Concurrent Collections conception originated from tagged stream processing development with HP TStreams.

STAPL is a library for C++, similar and compatible to STL. It provides parallelism support for writing applications for systems with shared or distributed memory.

Intel Array Building Blocks was a C++ library developed by Intel Corporation for exploiting data parallel portions of programs to take advantage of multi-core processors, graphics processing units and Intel Many Integrated Core Architecture processors. ArBB provides a generalized vector parallel programming solution designed to avoid direct dependencies on particular low-level parallelism mechanisms or hardware architectures. ArBB is oriented to applications that require data-intensive mathematical computations. By default, ArBB programs cannot create data races or deadlocks.

The following outline is provided as an overview of and topical guide to C++:

This page is a glossary of Operating systems terminology.

Fork–join model way of setting up and executing parallel computer programs

In parallel computing, the fork–join model is a way of setting up and executing parallel programs, such that execution branches off in parallel at designated points in the program, to "join" (merge) at a subsequent point and resume sequential execution. Parallel sections may fork recursively until a certain task granularity is reached. Fork–join can be considered a parallel design pattern. It was formulated as early as 1963.

References