Working set

Last updated May 13, 2023

Working set is a concept in computer science which defines the amount of memory that a process requires in a given time interval.^[1]

Definition

Peter Denning (1968) defines "the working set of information $W(t,\tau )$ of a process at time $t$ to be the collection of information referenced by the process during the process time interval $(t-\tau ,t)$ ".^[2] Typically the units of information in question are considered to be memory pages. This is suggested to be an approximation of the set of pages that the process will access in the future (say during the next $\tau$ time units), and more specifically is suggested to be an indication of what pages ought to be kept in main memory to allow most progress to be made in the execution of that process.

Rationale

The effect of the choice of what pages to be kept in main memory (as distinct from being paged out to auxiliary storage) is important: if too many pages of a process are kept in main memory, then fewer other processes can be ready at any one time. If too few pages of a process are kept in main memory, then its page fault frequency is greatly increased and the number of active (non-suspended) processes currently executing in the system approaches zero.

The working set model states that a process can be in RAM if and only if all of the pages that it is currently using (often approximated by the most recently used pages) can be in RAM. The model is an all or nothing model, meaning if the pages it needs to use increases, and there is no room in RAM, the process is swapped out of memory to free the memory for other processes to use.

Often a heavily loaded computer has so many processes queued up that, if all the processes were allowed to run for one scheduling time slice, they would refer to more pages than there is RAM, causing the computer to "thrash".

By swapping some processes from memory, the result is that processes—even processes that were temporarily removed from memory—finish much sooner than they would if the computer attempted to run them all at once. The processes also finish much sooner than they would if the computer only ran one process at a time to completion since it allows other processes to run and make progress during times that one process is waiting on the hard drive or some other global resource.

In other words, the working set strategy prevents thrashing while keeping the degree of multiprogramming as high as possible. Thus it optimizes CPU utilization and throughput.

Implementation

The main hurdle in implementing the working set model is keeping track of the working set. The working set window is a moving window. At each memory reference a new reference appears at one end and the oldest reference drops off the other end. A page is in the working set if it is referenced in the working set window.

To avoid the overhead of keeping a list of the last k referenced pages, the working set is often implemented by keeping track of the time t of the last reference, and considering the working set to be all pages referenced within a certain period of time.

The working set isn't a page replacement algorithm, but page-replacement algorithms can be designed to only remove pages that aren't in the working set for a particular process. One example is a modified version of the clock algorithm called WSClock.

Variants

Working set can be divided into code working set and data working set. This distinction is important when code and data are separate at the relevant level of the memory hierarchy, as if either working set does not fit in that level of the hierarchy, thrashing will occur. In addition to the code and data themselves, on systems with virtual memory, the memory map (of virtual memory to physical memory) entries of the pages of the working set must be cached in the translation lookaside buffer (TLB) for the process to progress efficiently. This distinction exists because code and data are cached in small blocks (cache lines), not entire pages, but address lookup is done at the page level. Thus even if the code and data working sets fit into cache, if the working sets are split across many pages, the virtual address working set may not fit into TLB, causing TLB thrashing.

Analogs of working set exist for other limited resources, most significantly processes. If a set of processes requires frequent interaction between multiple processes, then it has a process working set that must be coscheduled in order to progress:^[3]

parallel programs have a process working set that must be coscheduled (scheduled for execution simultaneously) for the parallel program to make progress.

If the processes are not scheduled simultaneously – for example, if there are two processes but only one core on which to execute them – then the processes can only advance at the rate of one interaction per time slice.

Other resources include file handles or network sockets – for example, copying one file to another is most simply done with two file handles: one for input, one for output, and thus has a "file handle working set" size of two. If only one file handle is available, copying can still be done, but requires acquiring a file handle for the input, reading from it (say into a buffer), releasing it, then acquiring a file handle for the output, writing to it, releasing it, then acquiring the input file handle again and repeating. Similarly a server may require many sockets, and if it is limited would need to repeatedly release and re-acquire sockets. Rather than thrashing, these resources are typically required for the program, and if it cannot acquire enough resources, it simply fails.

Related Research Articles

In computing, virtual memory, or virtual storage is a memory management technique that provides an "idealized abstraction of the storage resources that are actually available on a given machine" which "creates the illusion to users of a very large (main) memory".

In computer science, locality of reference, also known as the principle of locality, is the tendency of a processor to access the same set of memory locations repetitively over a short period of time. There are two basic types of reference locality – temporal and spatial locality. Temporal locality refers to the reuse of specific data and/or resources within a relatively small time duration. Spatial locality refers to the use of data elements within relatively close storage locations. Sequential locality, a special case of spatial locality, occurs when data elements are arranged and accessed linearly, such as traversing the elements in a one-dimensional array.

In computer science, algorithmic efficiency is a property of an algorithm which relates to the amount of computational resources used by the algorithm. An algorithm must be analyzed to determine its resource usage, and the efficiency of an algorithm can be measured based on the usage of different resources. Algorithmic efficiency can be thought of as analogous to engineering productivity for a repeating or continuous process.

A memory management unit (MMU), sometimes called paged memory management unit (PMMU), is a computer hardware unit having all memory references passed through itself, primarily performing the translation of virtual memory addresses to physical addresses.

In computer programming, a type system is a logical system comprising a set of rules that assigns a property called a type to every "term". Usually the terms are various constructs of a computer program, such as variables, expressions, functions, or modules. A type system dictates the operations that can be performed on a term. For variables, the type system determines the allowed values of that term. Type systems formalize and enforce the otherwise implicit categories the programmer uses for algebraic data types, data structures, or other components.

In computer operating systems, memory paging is a memory management scheme by which a computer stores and retrieves data from secondary storage for use in main memory. In this scheme, the operating system retrieves data from secondary storage in same-size blocks called pages. Paging is an important part of virtual memory implementations in modern operating systems, using secondary storage to let programs exceed the size of available physical memory.

A translation lookaside buffer (TLB) is a memory cache that stores the recent translations of virtual memory to physical memory. It is used to reduce the time taken to access a user memory location. It can be called an address-translation cache. It is a part of the chip's memory-management unit (MMU). A TLB may reside between the CPU and the CPU cache, between CPU cache and the main memory or between the different levels of the multi-level cache. The majority of desktop, laptop, and server processors include one or more TLBs in the memory-management hardware, and it is nearly always present in any processor that utilizes paged or segmented virtual memory.

In computer science, thrashing occurs when a computer's virtual memory resources are overused, leading to a constant state of paging and page faults, inhibiting most application-level processing. This causes the performance of the computer to degrade or collapse. The situation can continue indefinitely until either the user closes some running applications or the active processes free up additional virtual memory resources.

A page table is the data structure used by a virtual memory system in a computer operating system to store the mapping between virtual addresses and physical addresses. Virtual addresses are used by the program executed by the accessing process, while physical addresses are used by the hardware, or more specifically, by the random-access memory (RAM) subsystem. The page table is a key component of virtual address translation that is necessary to access data in memory.

In a computer operating system that uses paging for virtual memory management, page replacement algorithms decide which memory pages to page out, sometimes called swap out, or write to disk, when a page of memory needs to be allocated. Page replacement happens when a requested page is not in memory and a free page cannot be used to satisfy the allocation, either because there are none, or because the number of free pages is lower than some threshold.

A CPU cache is a hardware cache used by the central processing unit (CPU) of a computer to reduce the average cost to access data from the main memory. A cache is a smaller, faster memory, located closer to a processor core, which stores copies of the data from frequently used main memory locations. Most CPUs have a hierarchy of multiple cache levels, with different instruction-specific and data-specific caches at level 1. The cache memory is typically implemented with static random-access memory (SRAM), in modern CPUs by far the largest part of them by chip area, but SRAM is not always used for all levels, or even any level, sometimes some latter or all levels are implemented with eDRAM.

In computing, cache algorithms are optimizing instructions, or algorithms, that a computer program or a hardware-maintained structure can utilize in order to manage a cache of information stored on the computer. Caching improves performance by keeping recent or often-used data items in memory locations that are faster or computationally cheaper to access than normal memory stores. When the cache is full, the algorithm must choose which items to discard to make room for the new ones.

Peter James Denning is an American computer scientist and writer. He is best known for pioneering work in virtual memory, especially for inventing the working-set model for program behavior, which addressed thrashing in operating systems and became the reference standard for all memory management policies. He is also known for his works on principles of operating systems, operational analysis of queueing network systems, design and implementation of CSNET, the ACM digital library, codifying the great principles of computing, and most recently for the book The Innovator's Way, on innovation as a set of learnable practices.

In computing, a system resource, or simple resource, is any physical or virtual component of limited availability within a computer system. All connected devices and internal system components are resources. Virtual system resources include files, network connections, and memory areas.

In computing, external memory algorithms or out-of-core algorithms are algorithms that are designed to process data that are too large to fit into a computer's main memory at once. Such algorithms must be optimized to efficiently fetch and access data stored in slow bulk memory such as hard drives or tape drives, or when memory is on a computer network. External memory algorithms are analyzed in the external memory model.

In computer storage, fragmentation is a phenomenon in which storage space, main storage or secondary storage, is used inefficiently, reducing capacity or performance and often both. The exact consequences of fragmentation depend on the specific system of storage allocation in use and the particular form of fragmentation. In many cases, fragmentation leads to storage space being "wasted", and in that case the term also refers to the wasted space itself.

<span class="mw-page-title-main">V-11</span>

The V-11, code-named "Scorpio", is a miniprocessor chip set implementation of the VAX instruction set architecture (ISA) developed and fabricated by Digital Equipment Corporation (DEC).

Virtual memory compression is a memory management technique that utilizes data compression to reduce the size or number of paging requests to and from the auxiliary storage. In a virtual memory compression system, pages to be paged out of virtual memory are compressed and stored in physical memory, which is usually random-access memory (RAM), or sent as compressed to auxiliary storage such as a hard disk drive (HDD) or solid-state drive (SSD). In both cases the virtual memory range, whose contents has been compressed, is marked inaccessible so that attempts to access compressed pages can trigger page faults and reversal of the process. The footprint of the data being paged is reduced by the compression process; in the first instance, the freed RAM is returned to the available physical memory pool, while the compressed portion is kept in RAM. In the second instance, the compressed data is sent to auxiliary storage but the resulting I/O operation is smaller and therefore takes less time.

In computer science, shared memory is memory that may be simultaneously accessed by multiple programs with an intent to provide communication among them or avoid redundant copies. Shared memory is an efficient means of passing data between programs. Depending on context, programs may run on a single processor or on multiple separate processors.

Power10 is a superscalar, multithreading, multi-core microprocessor family, based on the open source Power ISA, and announced in August 2020 at the Hot Chips conference; systems with Power10 CPUs. Generally available from September 2021 in the IBM Power10 Enterprise E1080 server.

References

↑ Denning, Peter J. (2021-02-02). "Working Set Analytics". ACM Computing Surveys. Association for Computing Machinery (ACM). 53 (6): 1–36. doi:10.1145/3399709. ISSN 0360-0300.
↑ Denning, Peter J. (1968). "The working set model for program behavior" (PDF). Communications of the ACM. 11 (5): 323–333. doi:10.1145/363095.363141. S2CID 207669410.
↑ Ousterhout, J. K. (1982). "Scheduling Techniques for Concurrent Systems" (PDF). Proceedings of Third International Conference on Distributed Computing Systems: 22–30.