Copy-on-write

Last updated

Copy-on-write (COW), also called implicit sharing [1] or shadowing, [2] is a resource-management technique [3] used in programming to manage shared data efficiently. Instead of copying data right away when multiple programs use it, the same data is shared between programs until one tries to modify it. If no changes are made, no private copy is created, saving resources. [3] A copy is only made when needed, ensuring each program has its own version when modifications occur. This technique is commonly applied to memory, files, and data structures.

Contents

In virtual memory management

Copy-on-write finds its main use in operating systems, sharing the physical memory of computers running multiple processes, in the implementation of the fork() system call. Typically, the new process does not modify any memory and immediately executes a new process, replacing the address space entirely. It would waste processor time and memory to copy all of the old process's memory during the fork only to immediately discard the copy. [4]

Copy-on-write can be implemented efficiently using the page table by marking certain pages of memory as read-only and keeping a count of the number of references to the page. When data is written to these pages, the operating-system kernel intercepts the write attempt and allocates a new physical page, initialized with the copy-on-write data, although the allocation can be skipped if there is only one reference. The kernel then updates the page table with the new (writable) page, decrements the number of references, and performs the write. The new allocation ensures that a change in the memory of one process is not visible in another's.[ citation needed ]

The copy-on-write technique can be extended to support efficient memory allocation by keeping one page of physical memory filled with zeros. When the memory is allocated, all the pages returned refer to the page of zeros and are all marked copy-on-write. This way, physical memory is not allocated for the process until data is written, allowing processes to reserve more virtual memory than physical memory and use memory sparsely, at the risk of running out of virtual address space. The combined algorithm is similar to demand paging. [3]

Copy-on-write pages are also used in the Linux kernel's same-page merging feature. [5]

In software

COW is also used in library, application, and system code.

Examples

The string class provided by the C++ standard library was specifically designed to allow copy-on-write implementations in the initial C++98 standard, [6] but not in the newer C++11 standard: [7]

std::stringx("Hello");std::stringy=x;// x and y use the same buffer.y+=", World!";// Now y uses a different buffer; x still uses the same old buffer.

In the PHP programming language, all types except references are implemented as copy-on-write. For example, strings and arrays are passed by reference, but when modified, they are duplicated if they have non-zero reference counts. This allows them to act as value types without the performance problems of copying on assignment or making them immutable. [8]

In the Qt framework, many types are copy-on-write ("implicitly shared" in Qt's terms). Qt uses atomic compare-and-swap operations to increment or decrement the internal reference counter. Since the copies are cheap, Qt types can often be safely used by multiple threads without the need of locking mechanisms such as mutexes. The benefits of COW are thus valid in both single- and multithreaded systems. [9]

In computer storage

COW is used as the underlying mechanism in file systems like ZFS, Btrfs, ReFS, and Bcachefs, [10] as well as in logical volume management and database servers such as Microsoft SQL Server.

In traditional file systems, file changes overwrite the original data. With COW, when changes are made, a new version of the file is created while keeping the original intact. This approach enables features like snapshots, which capture the state of a file at a specific time without consuming much additional space. Snapshots typically store only the modified data and are kept close to the original. However, they are considered a weak form of incremental backup and cannot replace a full backup. [11]

See also

Related Research Articles

<span class="mw-page-title-main">Cache (computing)</span> Additional storage that enables faster access to main storage

In computing, a cache is a hardware or software component that stores data so that future requests for that data can be served faster; the data stored in a cache might be the result of an earlier computation or a copy of data stored elsewhere. A cache hit occurs when the requested data can be found in a cache, while a cache miss occurs when it cannot. Cache hits are served by reading data from the cache, which is faster than recomputing a result or reading from a slower data store; thus, the more requests that can be served from the cache, the faster the system performs.

<span class="mw-page-title-main">Operating system</span> Software that manages computer hardware resources

An operating system (OS) is system software that manages computer hardware and software resources, and provides common services for computer programs.

In computing, a segmentation fault or access violation is a fault, or failure condition, raised by hardware with memory protection, notifying an operating system (OS) the software has attempted to access a restricted area of memory. On standard x86 computers, this is a form of general protection fault. The operating system kernel will, in response, usually perform some corrective action, generally passing the fault on to the offending process by sending the process a signal. Processes can in some cases install a custom signal handler, allowing them to recover on their own, but otherwise the OS default signal handler is used, generally causing abnormal termination of the process, and sometimes a core dump.

<span class="mw-page-title-main">Virtual memory</span> Computer memory management technique

In computing, virtual memory, or virtual storage, is a memory management technique that provides an "idealized abstraction of the storage resources that are actually available on a given machine" which "creates the illusion to users of a very large (main) memory".

XFS is a high-performance 64-bit journaling file system created by Silicon Graphics, Inc (SGI) in 1993. It was the default file system in SGI's IRIX operating system starting with its version 5.3. XFS was ported to the Linux kernel in 2001; as of June 2014, XFS is supported by most Linux distributions; Red Hat Enterprise Linux uses it as its default file system.

ext3, or third extended filesystem, is a journaled file system that is commonly used with the Linux kernel. It used to be the default file system for many popular Linux distributions but generally has been supplanted by its successor version ext4. The main advantage of ext3 over its predecessor, ext2, is journaling, which improves reliability and eliminates the need to check the file system after an improper, a.k.a. unclean, shutdown.

In computing, a core dump, memory dump, crash dump, storage dump, system dump, or ABEND dump consists of the recorded state of the working memory of a computer program at a specific time, generally when the program has crashed or otherwise terminated abnormally. In practice, other key pieces of program state are usually dumped at the same time, including the processor registers, which may include the program counter and stack pointer, memory management information, and other processor and operating system flags and information. A snapshot dump is a memory dump requested by the computer operator or by the running program, after which the program is able to continue. Core dumps are often used to assist in diagnosing and debugging errors in computer programs.

In computing, particularly in the context of the Unix operating system and its workalikes, fork is an operation whereby a process creates a copy of itself. It is an interface which is required for compliance with the POSIX and Single UNIX Specification standards. It is usually implemented as a C standard library wrapper to the fork, clone, or other system calls of the kernel. Fork is the primary method of process creation on Unix-like operating systems.

In computer operating systems, memory paging is a memory management scheme by which a computer stores and retrieves data from secondary storage for use in main memory. In this scheme, the operating system retrieves data from secondary storage in same-size blocks called pages. Paging is an important part of virtual memory implementations in modern operating systems, using secondary storage to let programs exceed the size of available physical memory.

In computer storage, logical volume management or LVM provides a method of allocating space on mass-storage devices that is more flexible than conventional partitioning schemes to store volumes. In particular, a volume manager can concatenate, stripe together or otherwise combine partitions into larger virtual partitions that administrators can re-size or move, potentially without interrupting system use.

<span class="mw-page-title-main">Linux-VServer</span> OS-level virtualisation

Linux-VServer is a virtual private server implementation that was created by adding operating system-level virtualization capabilities to the Linux kernel. It is developed and distributed as open-source software.

In Linux, Logical Volume Manager (LVM) is a device mapper framework that provides logical volume management for the Linux kernel. Most modern Linux distributions are LVM-aware to the point of being able to have their root file systems on a logical volume.

"Zero-copy" describes computer operations in which the CPU does not perform the task of copying data from one memory area to another or in which unnecessary data copies are avoided. This is frequently used to save CPU cycles and memory bandwidth in many time consuming tasks, such as when transmitting a file at high speed over a network, etc., thus improving the performance of programs (processes) executed by a computer.

OS-level virtualization is an operating system (OS) virtualization paradigm in which the kernel allows the existence of multiple isolated user space instances, including containers, zones, virtual private servers (OpenVZ), partitions, virtual environments (VEs), virtual kernels, and jails. Such instances may look like real computers from the point of view of programs running in them. A computer program running on an ordinary operating system can see all resources of that computer. Programs running inside a container can only see the container's contents and devices assigned to the container.

splice is a Linux-specific system call that moves data between a file descriptor and a pipe without a round trip to user space. The related system call vmsplice moves or copies data between a pipe and user space. Ideally, splice and vmsplice work by remapping pages and do not actually copy any data, which may improve I/O performance. As linear addresses do not necessarily correspond to contiguous physical addresses, this may not be possible in all cases and on all hardware combinations.

In computing, a page cache, sometimes also called disk cache, is a transparent cache for the pages originating from a secondary storage device such as a hard disk drive (HDD) or a solid-state drive (SSD). The operating system keeps a page cache in otherwise unused portions of the main memory (RAM), resulting in quicker access to the contents of cached pages and overall performance improvements. A page cache is implemented in kernels with the paging memory management, and is mostly transparent to applications.

Btrfs is a computer storage format that combines a file system based on the copy-on-write (COW) principle with a logical volume manager, developed together. It was created by Chris Mason in 2007 for use in Linux, and since November 2013, the file system's on-disk format has been declared stable in the Linux kernel.

In computing, kernel same-page merging (KSM), also known as kernel shared memory, memory merging, memory deduplication, and page deduplication is a kernel feature that makes it possible for a hypervisor system to share memory pages that have identical contents between multiple processes or virtualized guests. While not directly linked, Kernel-based Virtual Machine (KVM) can use KSM to merge memory pages occupied by virtual machines.

<span class="mw-page-title-main">Shared memory</span> Computer memory that can be accessed by multiple processes

In computer science, shared memory is memory that may be simultaneously accessed by multiple programs with an intent to provide communication among them or avoid redundant copies. Shared memory is an efficient means of passing data between programs. Depending on context, programs may run on a single processor or on multiple separate processors.

Bcachefs is a copy-on-write (COW) file system for Linux-based operating systems. Its primary developer, Kent Overstreet, first announced it in 2015, and it was added to the Linux kernel beginning with 6.7. It is intended to compete with the modern features of ZFS or Btrfs, and the speed and performance of ext4 or XFS.

References

  1. "Implicit Sharing". Qt Project. Archived from the original on 8 February 2024. Retrieved 10 November 2023.
  2. Rodeh, Ohad (1 February 2008). "B-Trees, Shadowing, and Clones" (PDF). ACM Transactions on Storage. 3 (4): 1. CiteSeerX   10.1.1.161.6863 . doi:10.1145/1326542.1326544. S2CID   207166167. Archived from the original (PDF) on 2 January 2017. Retrieved 10 November 2023.
  3. 1 2 3 Bovet, Daniel Pierre; Cesati, Marco (1 January 2002). Understanding the Linux Kernel. O'Reilly Media. p. 295. ISBN   9780596002138. Archived from the original on 15 September 2024. Retrieved 10 November 2023.
  4. Silberschatz, Abraham; Galvin, Peter B.; Gagne, Greg (2018). Operating System Concepts (10th ed.). Wiley. pp. 120–123. ISBN   978-1119456339.
  5. Abbas, Ali. "The Kernel Samepage Merging Process". alouche.net. Archived from the original on 8 August 2016. Retrieved 10 November 2023.{{cite web}}: CS1 maint: unfit URL (link)
  6. Meyers, Scott (2012). Effective STL. Addison-Wesley. pp. 64–65. ISBN   9780132979184.
  7. "Concurrency Modifications to Basic String". Open Standards. Archived from the original on 10 November 2023. Retrieved 10 November 2023.
  8. Pauli, Julien; Ferrara, Anthony; Popov, Nikita (2013). "Memory management". PhpInternalsBook.com. Archived from the original on 10 November 2023. Retrieved 10 November 2023.
  9. "Threads and Implicitly Shared Classes". Qt Project. Archived from the original on 3 December 2023. Retrieved 10 November 2023.
  10. Kasampalis, Sakis (2010). "Copy-on-Write Based File Systems Performance Analysis and Implementation" (PDF). p. 19. Archived (PDF) from the original on 5 May 2024. Retrieved 10 November 2023.
  11. Chien, Tim. "Snapshots Are NOT Backups". Oracle.com. Oracle. Archived from the original on 10 November 2023. Retrieved 10 November 2023.