Concurrent Collections

Last updated December 23, 2023

Concurrent Collections (known as CnC) is a programming model for software frameworks to expose parallelism in applications. The Concurrent Collections conception originated from tagged stream processing development with HP TStreams.

TStreams

Around 2003, Hewlett-Packard Cambridge Research Lab developed TStreams, a stream processing forerunner of the basic concepts of CnC.^[1]^[2]^[3]

Concurrent Collections for C++

Concurrent Collections for C++ is an open source C++ template library developed by Intel for implementing parallel CnC applications in C++ with shared and/or distributed memory.

Habanero CnC

Rice University has developed various CnC language implementations based on their Habanero project infrastructure.

Notes

↑ TStreams: How to Write a Parallel Program (Technical report). Archived from the original on 2019-02-07. Retrieved 2014-09-07.
↑ TStreams: A Model of Parallel Computation (Technical report). Archived from the original on 2014-09-07. Retrieved 2014-09-07.
↑ Compiling to TStreams, a New Model of Parallel Computation (Technical report).

Related Research Articles

A distributed system is a system whose components are located on different networked computers, which communicate and coordinate their actions by passing messages to one another. Distributed computing is a field of computer science that studies distributed systems.

OpenMP is an application programming interface (API) that supports multi-platform shared-memory multiprocessing programming in C, C++, and Fortran, on many platforms, instruction-set architectures and operating systems, including Solaris, AIX, FreeBSD, HP-UX, Linux, macOS, and Windows. It consists of a set of compiler directives, library routines, and environment variables that influence run-time behavior.

Cilk, Cilk++, Cilk Plus and OpenCilk are general-purpose programming languages designed for multithreaded parallel computing. They are based on the C and C++ programming languages, which they extend with constructs to express parallel loops and the fork–join idiom.

Charles Eric Leiserson is a computer scientist, specializing in the theory of parallel computing and distributed computing, and particularly practical applications thereof. As part of this effort, he developed the Cilk multithreaded language. He invented the fat-tree interconnection network, a hardware-universal interconnection network used in many supercomputers, including the Connection Machine CM5, for which he was network architect. He helped pioneer the development of VLSI theory, including the retiming method of digital optimization with James B. Saxe and systolic arrays with H. T. Kung. He conceived of the notion of cache-oblivious algorithms, which are algorithms that have no tuning parameters for cache size or cache-line length, but nevertheless use cache near-optimally. He developed the Cilk language for multithreaded programming, which uses a provably good work-stealing algorithm for scheduling. Leiserson coauthored the standard algorithms textbook Introduction to Algorithms together with Thomas H. Cormen, Ronald L. Rivest, and Clifford Stein.

In computing, a parallel programming model is an abstraction of parallel computer architecture, with which it is convenient to express algorithms and their composition in programs. The value of a programming model can be judged on its generality: how well a range of different problems can be expressed for a variety of different architectures, and its performance: how efficiently the compiled programs can execute. The implementation of a parallel programming model can take the form of a library invoked from a sequential language, as an extension to an existing language, or as an entirely new language.

Concurrent computing is a form of computing in which several computations are executed concurrently—during overlapping time periods—instead of sequentially—with one completing before the next starts.

A multi-core processor is a microprocessor on a single integrated circuit with two or more separate processing units, called cores, each of which reads and executes program instructions. The instructions are ordinary CPU instructions but the single processor can run instructions on separate cores at the same time, increasing overall speed for programs that support multithreading or other parallel computing techniques. Manufacturers typically integrate the cores onto a single integrated circuit die or onto multiple dies in a single chip package. The microprocessors currently used in almost all personal computers are multi-core.

<span class="mw-page-title-main">Intel iPSC</span>

The Intel Personal SuperComputer was a product line of parallel computers in the 1980s and 1990s. The iPSC/1 was superseded by the Intel iPSC/2, and then the Intel iPSC/860.

The bulk synchronous parallel (BSP) abstract computer is a bridging model for designing parallel algorithms. It is similar to the parallel random access machine (PRAM) model, but unlike PRAM, BSP does not take communication and synchronization for granted. In fact, quantifying the requisite synchronization and communication is an important part of analyzing a BSP algorithm.

RapidMind Inc. was a privately held company founded and headquartered in Waterloo, Ontario, Canada, acquired by Intel in 2009. It provided a software product that aims to make it simpler for software developers to target multi-core processors and accelerators such as graphics processing units (GPUs).

oneAPI Threading Building Blocks, is a C++ template library developed by Intel for parallel programming on multi-core processors. Using TBB, a computation is broken down into tasks that can run in parallel. The library manages and schedules threads to execute these tasks.

Parallel Extensions was the development name for a managed concurrency library developed by a collaboration between Microsoft Research and the CLR team at Microsoft. The library was released in version 4.0 of the .NET Framework. It is composed of two parts: Parallel LINQ (PLINQ) and Task Parallel Library (TPL). It also consists of a set of coordination data structures (CDS) – sets of data structures used to synchronize and co-ordinate the execution of concurrent tasks.

Intel Parallel Studio XE was a software development product developed by Intel that facilitated native code development on Windows, macOS and Linux in C++ and Fortran for parallel computing. Parallel programming enables software programs to take advantage of multi-core processors from Intel and other processor vendors.

Software is said to exhibit scalable parallelism if it can make use of additional processors to solve larger problems, i.e. this term refers to software for which Gustafson's law holds. Consider a program whose execution time is dominated by one or more loops, each of that updates every element of an array --- for example, the following finite difference heat equation stencil calculation:

for t := 0 to T dofor i := 1 to N-1 do  new(i) := * .25  // explicit forward-difference with R = 0.25  endfor i := 1 to N-1 do  A(i) := new(i)  endend

Intel Array Building Blocks was a C++ library developed by Intel Corporation for exploiting data parallel portions of programs to take advantage of multi-core processors, graphics processing units and Intel Many Integrated Core Architecture processors. ArBB provides a generalized vector parallel programming solution designed to avoid direct dependencies on particular low-level parallelism mechanisms or hardware architectures. ArBB is oriented to applications that require data-intensive mathematical computations. By default, ArBB programs cannot create data races or deadlocks.

Intel Parallel Building Blocks (PBB) was a collection of three programming solutions designed for multithreaded parallel computing. PBB consisted of Cilk Plus, Threading Building Blocks (TBB) and Intel Array Building Blocks (ArBB).

In parallel computing, work stealing is a scheduling strategy for multithreaded computer programs. It solves the problem of executing a dynamically multithreaded computation, one that can "spawn" new threads of execution, on a statically multithreaded computer, with a fixed number of processors. It does so efficiently in terms of execution time, memory usage, and inter-processor communication.

In parallel computing, the fork–join model is a way of setting up and executing parallel programs, such that execution branches off in parallel at designated points in the program, to "join" (merge) at a subsequent point and resume sequential execution. Parallel sections may fork recursively until a certain task granularity is reached. Fork–join can be considered a parallel design pattern. It was formulated as early as 1963.

Richard Vuduc is a tenured professor of computer science at the Georgia Institute of Technology. His research lab, The HPC Garage, studies high-performance computing, scientific computing, parallel algorithms, modeling, and engineering. He is a member of the Association for Computing Machinery (ACM). As of 2022, Vuduc serves as Vice President of the SIAM Activity Group on Supercomputing. He has co-authored over 200 articles in peer-reviewed journals and conferences.

References

Budimlic, Z.; Chandramowlishwaran, A. M.; Knobe, K.; Lowney, G. N.; Sarkar, V.; Treggiari, L. (2008). Declarative aspects of memory management in the concurrent collections parallel programming model (PDF). DAMP '09. Proceedings of the 4th workshop on Declarative aspects of multicore programming. pp. 47–58. doi:10.1145/1481839.1481846. ISBN 978-1-60558-417-1.
Budimlić, Z.; Burke, M.; Cavé1, V.; Knobe, K.; Lowney, G.; Newton, R.; Palsberg, J.; Peixotto1, D.; Sarkar, V.; Schlimbach, F.; Taşırlar, S. (2010). "Concurrent Collections" (PDF). Scientific Programming. 18 (3–4): 203–217. doi: 10.1155/2010/521797 . Retrieved 2013-08-25.{{cite journal}}: CS1 maint: numeric names: authors list (link)
Chandramowlishwaran, A.; Knobe, K.; Vuduc, R. (2010). Applying the concurrent collections programming model to asynchronous parallel dense linear algebra (PDF). PPoPP '10. Proceedings of the 15th ACM SIGPLAN symposium on Principles and practice of parallel programming. pp. 345–346. doi:10.1145/1693453.1693506. ISBN 978-1-60558-708-0.
Chandramowlishwaran, A.; Knobe, K.; Vuduc, R. (2010). "Performance evaluation of concurrent collections on high-performance multicore computing systems". 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS) (PDF). IPDPS 2010. pp. 1–12. CiteSeerX 10.1.1.169.5643 . doi:10.1109/IPDPS.2010.5470404. ISBN 978-1-4244-6442-5.
Burke, M. G.; Knobe, K.; Newton, R.; Sarkar, V. (2011). "Concurrent Collections Programming Model". Encyclopedia of Parallel Computing (PDF). Vol. 4. Springer. pp. 364–371. doi:10.1007/978-0-387-09766-4_238. ISBN 978-0-387-09765-7 . Retrieved 2013-08-25.
Tang, P. (25 December 2012). "Measuring the overhead of Intel C++ Concurrent Collections over Threading Building Blocks for Gauss–Jordan elimination" (PDF). Concurrency and Computation: Practice and Experience. 24 (18): 2282–2301. doi:10.1002/cpe.2811. S2CID 13585339.

External links

Intel Concurrent Collections for C++ for Windows and Linux at Intel DZ, a "What If" project
- Intel Concurrent Collections for C++ on SourceForge
- Intel Concurrent Collections for C++ at GitHub
  - Intel Concurrent Collections for C++ on GitHub
CNC - Habanero Concurrent Collections as part of the Rice University Habanero project

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] TStreams: How to Write a Parallel Program (Technical report). Archived from the original on 2019-02-07. Retrieved 2014-09-07.

[2] TStreams: A Model of Parallel Computation (Technical report). Archived from the original on 2014-09-07. Retrieved 2014-09-07.

[3] Compiling to TStreams, a New Model of Parallel Computation (Technical report).

[1]

[2]

[3]

v t e Intel software
Items in italics are no longer maintained or have planned end-of-life dates.
Development	Parallel Studio C++ Compiler Fortran Compiler Advisor Inspector INTERP/80 VTune
Components	Data Analytics Library (DAL) Integrated Performance Primitives (IPP) Math Kernel Library (MKL) Threading Building Blocks (TBB)
Open source	Data Analytics Library (DAL) Threading Building Blocks (TBB) Tizen OpenVINO
Software programs	Telekinesys Research ¹ Havok ¹ Vision ¹
Organizations	Developer Zone Research
¹Sold to Microsoft