Cgroups

cgroups
Original author(s)	v1: Paul Menage, Rohit Seth, Memory Controller by Balbir Singh, CPU controller by Srivatsa Vaddagiri; v2: Tejun Heo
Developer(s)	Tejun Heo, Johannes Weiner, Michal Hocko, Waiman Long, Roman Gushchin, Chris Down et al.
Initial release	2007;18 years ago
Written in	C
Operating system	Linux
Type	System software
License	GPL and LGPL
Website	Cgroup v1 , Cgroup v2

Last updated January 04, 2025

cgroups (abbreviated from control groups) is a Linux kernel feature that limits, accounts for, and isolates the resource usage (CPU, memory, disk I/O, etc.^[1]) of a collection of processes.

Engineers at Google started the work on this feature in 2006 under the name "process containers".^[2] In late 2007, the nomenclature changed to "control groups" to avoid confusion caused by multiple meanings of the term "container" in the Linux kernel context, and the control groups functionality was merged into the Linux kernel mainline in kernel version 2.6.24, which was released in January 2008.^[3] Since then, developers have added many new features and controllers, such as support for kernfs in 2014,^[4] firewalling,^[5] and unified hierarchy.^[6] cgroup v2 was merged in Linux kernel 4.5^[7] with significant changes to the interface and internal functionality.^[8]

Versions

There are two versions of cgroups.

Cgroups was originally written by Paul Menage and Rohit Seth, and merged into the mainline Linux kernel in 2007. Afterwards this is called cgroups version 1.^[9]

Development and maintenance of cgroups was then taken over by Tejun Heo. Tejun Heo redesigned and rewrote cgroups. This rewrite is now called version 2, the documentation of cgroup-v2 first appeared in Linux kernel 4.5 released on 14 March 2016.^[7]

Unlike v1, cgroup v2 has only a single process hierarchy and discriminates between processes, not threads.

Features

One of the design goals of cgroups is to provide a unified interface to many different use cases, from controlling single processes (by using nice, for example) to full operating system-level virtualization (as provided by OpenVZ, Linux-VServer or LXC, for example). Cgroups provides:

Resource limiting: groups can be set not to exceed a configured memory limit, which also includes the file system cache,^[10]^[11] I/O bandwidth limit,^[12] CPU quota limit,^[13] CPU set limit,^[14] or maximum open files.^[15]
Prioritization: some groups may get a larger share of CPU utilization^[16] or disk I/O throughput^[17]
Accounting: measures a group's resource usage, which may be used, for example, for billing purposes^[18]
Control: freezing groups of processes, their checkpointing and restarting^[18]

Use

A control group (abbreviated as cgroup) is a collection of processes that are bound by the same criteria and associated with a set of parameters or limits. These groups can be hierarchical, meaning that each group inherits limits from its parent group. The kernel provides access to multiple controllers (also called subsystems) through the cgroup interface;^[3] for example, the "memory" controller limits memory use, "cpuacct" accounts CPU usage, etc.

Control groups can be used in multiple ways:

By accessing the cgroup virtual file system manually.
By creating and managing groups on the fly using tools like cgcreate, cgexec, and cgclassify (from libcgroup).
Through the "rules engine daemon" that can automatically move processes of certain users, groups, or commands to cgroups as specified in its configuration.
Indirectly through other software that uses cgroups, such as Docker, Firejail, LXC,^[19] libvirt, systemd, Open Grid Scheduler/Grid Engine,^[20] and Google's developmentally defunct lmctfy.

The Linux kernel documentation contains some technical details of the setup and use of control groups version 1^[21] and version 2.^[22]systemd-cgtop^[23] command can be used to show top control groups by their resource usage.

Redesign

Redesign of cgroups started in 2013,^[24] with additional changes brought by versions 3.15 and 3.16 of the Linux kernel.^[25]^[26]^[27]

Namespace isolation

While not technically part of the cgroups work, a related feature of the Linux kernel is namespace isolation, where groups of processes are separated such that they cannot "see" resources in other groups. For example, a PID namespace provides a separate enumeration of process identifiers within each namespace. Also available are mount, user, UTS (Unix Time Sharing), network and SysV IPC namespaces.

The PID namespace provides isolation for the allocation of process identifiers (PIDs), lists of processes and their details. While the new namespace is isolated from other siblings, processes in its "parent" namespace still see all processes in child namespaces—albeit with different PID numbers.^[28]
Network namespace isolates the network interface controllers (physical or virtual), iptables firewall rules, routing tables etc. Network namespaces can be connected with each other using the "veth" virtual Ethernet device.^[29]
"UTS" namespace allows changing the hostname.
Mount namespace allows creating a different file system layout, or making certain mount points read-only.^[30]
IPC namespace isolates the System V inter-process communication between namespaces.
User namespace isolates the user IDs between namespaces.^[31]
Cgroup namespace^[32]

Namespaces are created with the "unshare" command or syscall, or as "new" flags in a "clone" syscall.^[33]

The "ns" subsystem was added early in cgroups development to integrate namespaces and control groups. If the "ns" cgroup was mounted, each namespace would also create a new group in the cgroup hierarchy. This was an experiment that was later judged to be a poor fit for the cgroups API, and removed from the kernel.

Linux namespaces were inspired by the more general namespace functionality used heavily throughout Plan 9 from Bell Labs.^[34]

Unified hierarchy

Kernfs was introduced into the Linux kernel with version 3.14 in March 2014, the main author being Tejun Heo.^[35] One of the main motivators for a separate kernfs is the cgroups file system. Kernfs is basically created by splitting off some of the sysfs logic into an independent entity, thus easing for other kernel subsystems the implementation of their own virtual file system with handling for device connect and disconnect, dynamic creation and removal, and other attributes. Redesign continued into version 3.15 of the Linux kernel.^[36]

Kernel memory control groups (kmemcg)

Kernel memory control groups (kmemcg) were merged into version 3.8 (2013 February 18;11 years ago (18-02-2013)) of the Linux kernel mainline.^[37]^[38]^[39] The kmemcg controller can limit the amount of memory that the kernel can utilize to manage its own internal processes.

cgroup awareness of OOM killer

Linux Kernel 4.19 (October 2018) introduced cgroup awareness of OOM killer implementation which adds an ability to kill a cgroup as a single unit and so guarantee the integrity of the workload.^[40]

Adoption

Various projects use cgroups as their basis, including CoreOS, Docker (in 2013), Hadoop, Jelastic, Kubernetes,^[41] lmctfy (Let Me Contain That For You), LXC (Linux Containers), systemd, Mesos and Mesosphere,^[41] and HTCondor.

Major Linux distributions also adopted it such as Red Hat Enterprise Linux (RHEL) 6.0 in November 2010, three years before adoption by the mainline Linux kernel.^[42]

On 29 October 2019, the Fedora Project modified Fedora 31 to use CgroupsV2 by default^[43]

Related Research Articles

Non-uniform memory access (NUMA) is a computer memory design used in multiprocessing, where the memory access time depends on the memory location relative to the processor. Under NUMA, a processor can access its own local memory faster than non-local memory. NUMA is beneficial for workloads with high memory locality of reference and low lock contention, because a processor may operate on a subset of memory mostly or entirely within its own cache node, reducing traffic on the memory bus.

A network interface controller is a computer hardware component that connects a computer to a computer network.

Unix-like operating systems identify a user by a value called a user identifier, often abbreviated to user ID or UID. The UID, along with the group identifier (GID) and other access control criteria, is used to determine which system resources a user can access. The password file maps textual user names to UIDs. UIDs are stored in the inodes of the Unix file system, running processes, tar archives, and the now-obsolete Network Information Service. In POSIX-compliant environments, the shell command id gives the current user's UID, as well as more information such as the user name, primary user group and group identifier (GID).

The Direct Rendering Manager (DRM) is a subsystem of the Linux kernel responsible for interfacing with GPUs of modern video cards. DRM exposes an API that user-space programs can use to send commands and data to the GPU and perform operations such as configuring the mode setting of the display. DRM was first developed as the kernel-space component of the X Server Direct Rendering Infrastructure, but since then it has been used by other graphic stack alternatives such as Wayland and standalone applications and libraries such as SDL2 and Kodi.

The proc filesystem (procfs) is a special filesystem in Unix-like operating systems that presents information about processes and other system information in a hierarchical file-like structure, providing a more convenient and standardized method for dynamically accessing process data held in the kernel than traditional tracing methods or direct access to kernel memory. Typically, it is mapped to a mount point named /proc at boot time. The proc file system acts as an interface to internal data structures about running processes in the kernel. In Linux, it can also be used to obtain information about the kernel and to change certain kernel parameters at runtime (sysctl).

seccomp is a computer security facility in the Linux kernel. seccomp allows a process to make a one-way transition into a "secure" state where it cannot make any system calls except exit , sigreturn , read and write to already-open file descriptors. Should it attempt any other system calls, the kernel will either just log the event or terminate the process with SIGKILL or SIGSYS. In this sense, it does not virtualize the system's resources but isolates the process from them entirely.

OS-level virtualization is an operating system (OS) virtualization paradigm in which the kernel allows the existence of multiple isolated user space instances, including containers, zones, virtual private servers (OpenVZ), partitions, virtual environments (VEs), virtual kernels, and jails. Such instances may look like real computers from the point of view of programs running in them. A computer program running on an ordinary operating system can see all resources of that computer. Programs running inside a container can only see the container's contents and devices assigned to the container.

Out of memory (OOM) is an often undesired state of computer operation where no additional memory can be allocated for use by programs or the operating system. Such a system will be unable to load any additional programs, and since many programs may load additional data into memory during execution, these will cease to function correctly. This usually occurs because all available memory, including disk swap space, has been allocated.Roberto ramirez orozco

OpenVZ is an operating-system-level virtualization technology for Linux. It allows a physical server to run multiple isolated operating system instances, called containers, virtual private servers (VPSs), or virtual environments (VEs). OpenVZ is similar to Solaris Containers and LXC.

In computing, virtualization is the use of a computer to simulate another computer. The following is a chronological list of virtualization technologies.

The Linux booting process involves multiple stages and is in many ways similar to the BSD and other Unix-style boot processes, from which it derives. Although the Linux booting process depends very much on the computer architecture, those architectures share similar stages and software components, including system startup, bootloader execution, loading and startup of a Linux kernel image, and execution of various startup scripts and daemons. Those are grouped into 4 steps: system startup, bootloader stage, kernel stage, and init process.

<span class="mw-page-title-main">Linux kernel</span> Free Unix-like operating system kernel

The Linux kernel is a free and open source, Unix-like kernel that is used in many computer systems worldwide. The kernel was created by Linus Torvalds in 1991 and was soon adopted as the kernel for the GNU operating system (OS) which was created to be a free replacement for Unix. Since the late 1990s, it has been included in many operating system distributions, many of which are called Linux. One such Linux kernel operating system is Android which is used in many mobile and embedded devices.

Readahead is a system call of the Linux kernel that loads a file's contents into the page cache. This prefetches the file so that when it is subsequently accessed, its contents are read from the main memory (RAM) rather than from a hard disk drive (HDD), resulting in much lower file access latencies.

Linux Containers (LXC) is an operating system-level virtualization method for running multiple isolated Linux systems (containers) on a control host using a single Linux kernel.

systemd Suite of system components for Linux

systemd is a software suite that provides an array of system components for Linux operating systems. The main aim is to unify service configuration and behavior across Linux distributions. Its primary component is a "system and service manager" — an init system used to bootstrap user space and manage user processes. It also provides replacements for various daemons and utilities, including device management, login management, network connection management, and event logging. The name systemd adheres to the Unix convention of naming daemons by appending the letter d. It also plays on the term "System D", which refers to a person's ability to adapt quickly and improvise to solve problems.

Docker is a set of platform as a service (PaaS) products that use OS-level virtualization to deliver software in packages called containers. The service has both free and premium tiers. The software that hosts the containers is called Docker Engine. It was first released in 2013 and is developed by Docker, Inc.

zswap is a Linux kernel feature that provides a compressed write-back cache for swapped pages, as a form of virtual memory compression. Instead of moving memory pages to a swap device when they are to be swapped out, zswap performs their compression and then stores them into a memory pool dynamically allocated in the system RAM. Later writeback to the actual swap device is deferred or even completely avoided, resulting in a significantly reduced I/O for Linux systems that require swapping; the tradeoff is the need for additional CPU cycles to perform the compression.

Namespaces are a feature of the Linux kernel that partition kernel resources such that one set of processes sees one set of resources, while another set of processes sees a different set of resources. The feature works by having the same namespace for a set of resources and processes, but those namespaces refer to distinct resources. Resources may exist in multiple namespaces. Examples of such resources are process IDs, host-names, user IDs, file names, some names associated with network access, and Inter-process communication.

In the Linux kernel, kernfs is a set of functions that contain the functionality required for creating the pseudo file systems used internally by various kernel subsystems so that they may use virtual files. For example, sysfs provides a set of virtual files by exporting information about hardware devices and associated device drivers from the kernel's device model to user space.

Container Linux is a discontinued open-source lightweight operating system based on the Linux kernel and designed for providing infrastructure for clustered deployments. One of its focuses was scalability. As an operating system, Container Linux provided only the minimal functionality required for deploying applications inside software containers, together with built-in mechanisms for service discovery and configuration sharing.

References

↑ "Control Group v2 — The Linux Kernel documentation". www.kernel.org. Retrieved 24 June 2023.
↑ Jonathan Corbet (29 May 2007). "Process containers". LWN.net.
1 2 Jonathan Corbet (29 October 2007). "Notes from a container". LWN.net . Retrieved 14 April 2015. The original 'containers' name was considered to be too generic – this code is an important part of a container solution, but it's far from the whole thing. So containers have now been renamed 'control groups' (or 'cgroups') and merged for 2.6.24.
↑ "cgroup: convert to kernfs". Linux kernel mailing list. 28 January 2014.
↑ "netfilter: x_tables: lightweight process control group matching". 23 April 2014. Archived from the original on 24 April 2014.
↑ "cgroup: prepare for the default unified hierarchy". 13 March 2014.
1 2 "Documentation/cgroup-v2.txt as appeared in Linux kernel 4.5". 14 March 2016.
↑ "cgroup v2: Multiple hierarchies".
↑ "diff between Linux kernel 4.4 and 4.5". 14 March 2016.
↑ Jonathan Corbet (31 July 2007). "Controlling memory use in containers". LWN.
↑ Balbir Singh, Vaidynathan Srinivasan (July 2007). "Containers: Challenges with the memory resource controller and its performance" (PDF). Ottawa Linux Symposium.
↑ Carvalho, André (18 October 2017). "Using cgroups to limit I/O". andrestc.com. Retrieved 12 September 2022.
↑ Luu, Dan. "The container throttling problem". danluu.com. Retrieved 12 September 2022.
↑ Derr, Simon (2004). "CPUSETS" . Retrieved 12 September 2022.
↑ "setrlimit(2) — Arch manual pages". man.archlinux.org. Retrieved 27 November 2023.
↑ Jonathan Corbet (23 October 2007). "Kernel space: Fair user scheduling for Linux". Network World. Archived from the original on 19 October 2013. Retrieved 22 August 2012.
↑ Kamkamezawa Hiroyu (19 November 2008). Cgroup and Memory Resource Controller (PDF). Japan Linux Symposium. Archived from the original (PDF presentation slides) on 22 July 2011.
1 2 Hansen D, IBM Linux Technology Center (2009). Resource Management (PDF presentation slides). Linux Foundation.
↑ Matt Helsley (3 February 2009). "LXC: Linux container tools". IBM developerWorks.
↑ "Grid Engine cgroups Integration". Scalable Logic. 22 May 2012.
↑ "cgroups". kernel.org.
↑ "Torvalds/Linux". GitHub . 13 February 2022.
↑ "Systemd-cgtop".
↑ "All About the Linux Kernel: Cgroup's Redesign". Linux.com . 15 August 2013. Archived from the original on 28 April 2019. Retrieved 19 May 2014.
↑ "The unified control group hierarchy in 3.16". LWN.net. 11 June 2014.
↑ "Pull cgroup updates for 3.15 from Tejun Heo". kernel.org. 3 April 2014.
↑ "Pull cgroup updates for 3.16 from Tejun Heo". kernel.org. 9 June 2014.
↑ Pavel Emelyanov, Kir Kolyshkin (19 November 2007). "PID namespaces in the 2.6.24 kernel". LWN.net.
↑ Jonathan Corbet (30 January 2007). "Network namespaces". LWN.net.
↑ Serge E. Hallyn, Ram Pai (17 September 2007). "Applying mount namespaces". IBM developerWorks.
↑ Michael Kerrisk (27 February 2013). "Namespaces in operation, part 5: User namespaces". lwn.net Linux Info from the Source.
↑ "LKML: Linus Torvalds: Linux 4.6-rc1".
↑ Janak Desai (11 January 2006). "Linux kernel documentation on unshare".
↑ "The Use of Name Spaces in Plan 9". 1992. Archived from the original on 6 September 2014. Retrieved 15 February 2015.
↑ "kernfs, sysfs, driver-core: implement synchronous self-removal". LWN.net. 3 February 2014. Retrieved 7 April 2014.
↑ "Linux kernel source tree: kernel/git/torvalds/linux.git: cgroups: convert to kernfs". kernel.org. 11 February 2014. Retrieved 23 May 2014.
↑ "memcg: kmem controller infrastructure". kernel.org source code. 18 December 2012.
↑ "memcg: kmem accounting basic infrastructure". kernel.org source code. 18 December 2012.
↑ "memcg: add documentation about the kmem controller". kernel.org. 18 December 2012.
↑ "Linux_4.19 - Linux Kernel Newbies".
1 2 "Mesosphere to Bring Google's Kubernetes to Mesos". Mesosphere.io. 10 July 2014. Archived from the original on 6 September 2015. Retrieved 13 July 2014.
↑ "6.0 Release Notes" (PDF). redhat.com. Retrieved 12 September 2023.
↑ "1732114 – Modify Fedora 31 to use CgroupsV2 by default".

External links

Official Linux kernel documentation on cgroups v1 and cgroups v2
Red Hat Resource Management Guide on cgroups
Ubuntu manpage on cgroups Archived 9 August 2021 at the Wayback Machine
Linux kernel Namespaces and cgroups by Rami Rosen (2013)
Namespaces and cgroups, the basis of Linux containers (including cgroups v2), slides of a talk by Rami Rosen, Netdev 1.1, Seville, Spain, 2016
Understanding the new control groups API, LWN.net, by Rami Rosen, March 2016
Large-scale cluster management at Google with Borg, April 2015, by Abhishek Verma, Luis Pedrosa, Madhukar Korupolu, David Oppenheimer, Eric Tune and John Wilkes
Job Objects, similar feature on Windows

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] "Control Group v2 — The Linux Kernel documentation". www.kernel.org. Retrieved 24 June 2023.

[2] Jonathan Corbet (29 May 2007). "Process containers". LWN.net.

[lwn-notes-3] 1 2 Jonathan Corbet (29 October 2007). "Notes from a container". LWN.net . Retrieved 14 April 2015. The original 'containers' name was considered to be too generic – this code is an important part of a container solution, but it's far from the whole thing. So containers have now been renamed 'control groups' (or 'cgroups') and merged for 2.6.24.

[4] "cgroup: convert to kernfs". Linux kernel mailing list. 28 January 2014.

[5] "netfilter: x_tables: lightweight process control group matching". 23 April 2014. Archived from the original on 24 April 2014.

[6] "cgroup: prepare for the default unified hierarchy". 13 March 2014.

[:0-7] 1 2 "Documentation/cgroup-v2.txt as appeared in Linux kernel 4.5". 14 March 2016.

[8] "cgroup v2: Multiple hierarchies".

[9] "diff between Linux kernel 4.4 and 4.5". 14 March 2016.

[10] Jonathan Corbet (31 July 2007). "Controlling memory use in containers". LWN.

[ols-memcg-11] Balbir Singh, Vaidynathan Srinivasan (July 2007). "Containers: Challenges with the memory resource controller and its performance" (PDF). Ottawa Linux Symposium.

[12] Carvalho, André (18 October 2017). "Using cgroups to limit I/O". andrestc.com. Retrieved 12 September 2022.

[13] Luu, Dan. "The container throttling problem". danluu.com. Retrieved 12 September 2022.

[14] Derr, Simon (2004). "CPUSETS" . Retrieved 12 September 2022.

[15] "setrlimit(2) — Arch manual pages". man.archlinux.org. Retrieved 27 November 2023.

[16] Jonathan Corbet (23 October 2007). "Kernel space: Fair user scheduling for Linux". Network World. Archived from the original on 19 October 2013. Retrieved 22 August 2012.

[17] Kamkamezawa Hiroyu (19 November 2008). Cgroup and Memory Resource Controller (PDF). Japan Linux Symposium. Archived from the original (PDF presentation slides) on 22 July 2011.

[lf-hansen-18] 1 2 Hansen D, IBM Linux Technology Center (2009). Resource Management (PDF presentation slides). Linux Foundation.

[19] Matt Helsley (3 February 2009). "LXC: Linux container tools". IBM developerWorks.

[20] "Grid Engine cgroups Integration". Scalable Logic. 22 May 2012.

[21] "cgroups". kernel.org.

[22] "Torvalds/Linux". GitHub . 13 February 2022.

[23] "Systemd-cgtop".

[cgroups_redesign_by_Tejun_Heo-24] "All About the Linux Kernel: Cgroup's Redesign". Linux.com . 15 August 2013. Archived from the original on 28 April 2019. Retrieved 19 May 2014.

[25] "The unified control group hierarchy in 3.16". LWN.net. 11 June 2014.

[26] "Pull cgroup updates for 3.15 from Tejun Heo". kernel.org. 3 April 2014.

[27] "Pull cgroup updates for 3.16 from Tejun Heo". kernel.org. 9 June 2014.

[lwn-pid-28] Pavel Emelyanov, Kir Kolyshkin (19 November 2007). "PID namespaces in the 2.6.24 kernel". LWN.net.

[29] Jonathan Corbet (30 January 2007). "Network namespaces". LWN.net.

[30] Serge E. Hallyn, Ram Pai (17 September 2007). "Applying mount namespaces". IBM developerWorks.

[31] Michael Kerrisk (27 February 2013). "Namespaces in operation, part 5: User namespaces". lwn.net Linux Info from the Source.

[32] "LKML: Linus Torvalds: Linux 4.6-rc1".

[33] Janak Desai (11 January 2006). "Linux kernel documentation on unshare".

[34] "The Use of Name Spaces in Plan 9". 1992. Archived from the original on 6 September 2014. Retrieved 15 February 2015.

[35] "kernfs, sysfs, driver-core: implement synchronous self-removal". LWN.net. 3 February 2014. Retrieved 7 April 2014.

[36] "Linux kernel source tree: kernel/git/torvalds/linux.git: cgroups: convert to kernfs". kernel.org. 11 February 2014. Retrieved 23 May 2014.

[37] "memcg: kmem controller infrastructure". kernel.org source code. 18 December 2012.

[38] "memcg: kmem accounting basic infrastructure". kernel.org source code. 18 December 2012.

[39] "memcg: add documentation about the kmem controller". kernel.org. 18 December 2012.

[40] "Linux_4.19 - Linux Kernel Newbies".

[mesosphere-41] 1 2 "Mesosphere to Bring Google's Kubernetes to Mesos". Mesosphere.io. 10 July 2014. Archived from the original on 6 September 2015. Retrieved 13 July 2014.

[42] "6.0 Release Notes" (PDF). redhat.com. Retrieved 12 September 2023.

[43] "1732114 – Modify Fedora 31 to use CgroupsV2 by default".

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

[28]

[29]

[30]

[31]

[32]

[33]

[34]

[35]

[36]

[37]

[38]

[39]

[40]

[41]

[42]

[43]