Page cache

Last updated

In computing, a page cache, sometimes also called disk cache, [1] is a transparent cache for the pages originating from a secondary storage device such as a hard disk drive (HDD) or a solid-state drive (SSD). The operating system keeps a page cache in otherwise unused portions of the main memory (RAM), resulting in quicker access to the contents of cached pages and overall performance improvements. A page cache is implemented in kernels with the paging memory management, and is mostly transparent to applications.

Contents

Usually, all physical memory not directly allocated to applications is used by the operating system for the page cache. Since the memory would otherwise be idle and is easily reclaimed when applications request it, there is generally no associated performance penalty and the operating system might even report such memory as "free" or "available".

When compared to main memory, hard disk drive read/writes are slow and random accesses require expensive disk seeks; as a result, larger amounts of main memory bring performance improvements as more data can be cached in memory. [2] Separate disk caching is provided on the hardware side, by dedicated RAM or NVRAM chips located either in the disk controller (in which case the cache is integrated into a hard disk drive and usually called disk buffer [3] ), or in a disk array controller, such memory should not be confused with the page cache. The operating system may also use some of main memory as filesystem write buffer, it may be called page buffer. [4]

Memory conservation

Pages in the page cache modified after being brought in are called dirty pages. [5] Since non-dirty pages in the page cache have identical copies in secondary storage (e.g. hard disk drive or solid-state drive), discarding and reusing their space is much quicker than paging out application memory, and is often preferred over flushing the dirty pages into secondary storage and reusing their space. Executable binaries, such as applications and libraries, are also typically accessed through page cache and mapped to individual process spaces using virtual memory (this is done through the mmap system call on Unix-like operating systems). This not only means that the binary files are shared between separate processes, but also that unused parts of binaries will be flushed out of main memory eventually, leading to memory conservation.

Since cached pages can be easily evicted and re-used, some operating systems, notably Windows NT, even report the page cache usage as "available" memory, while the memory is actually allocated to disk pages. This has led to some confusion about the utilization of page cache in Windows.

Disk writes

The page cache also aids in writing to a disk. Pages in the main memory that have been modified during writing data to disk are marked as "dirty" and have to be flushed to disk before they can be freed. When a file write occurs, the cached page for the particular block is looked up. If it is already found in the page cache, the write is done to that page in the main memory. If it is not found in the page cache, then, when the write perfectly falls on page size boundaries, the page is not even read from disk, but allocated and immediately marked dirty. Otherwise, the page(s) are fetched from disk and requested modifications are done. A file that is created or opened in the page cache, but not written to, might result in a zero byte file at a later read.

However, not all cached pages can be written to as program code is often mapped as read-only or copy-on-write; in the latter case, modifications to code will only be visible to the process itself and will not be written to disk.

Side-channel attacks

In 2019, security researchers demonstrated side-channel attacks against the page cache: it's possible to bypass privilege separation and exfiltrate data about other processes by systematically monitoring whether some file pages (for example executable or library files) are present in the cache or not. [6]

See also

Related Research Articles

<span class="mw-page-title-main">Computer data storage</span> Storage of digital data readable by computers

Computer data storage is a technology consisting of computer components and recording media that are used to retain digital data. It is a core function and fundamental component of computers.

<span class="mw-page-title-main">Cache (computing)</span> Additional storage that enables faster access to main storage

In computing, a cache is a hardware or software component that stores data so that future requests for that data can be served faster; the data stored in a cache might be the result of an earlier computation or a copy of data stored elsewhere. A cache hit occurs when the requested data can be found in a cache, while a cache miss occurs when it cannot. Cache hits are served by reading data from the cache, which is faster than recomputing a result or reading from a slower data store; thus, the more requests that can be served from the cache, the faster the system performs.

A RAM drive is a block of random-access memory that a computer's software is treating as if the memory were a disk drive. RAM drives provide high-performance temporary storage for demanding tasks and protect non-volatile storage devices from wearing down, since RAM is not prone to wear from writing, unlike non-volatile flash memory. They are in a sense the reverse of virtual memory: RAM drive uses a volatile fast memory as if it's a nonvolatile slow memory. Virtual memory is the opposite.

Direct memory access (DMA) is a feature of computer systems that allows certain hardware subsystems to access main system memory independently of the central processing unit (CPU).

<span class="mw-page-title-main">Memory management unit</span> Hardware translating virtual addresses to physical address

A memory management unit (MMU), sometimes called paged memory management unit (PMMU), is a computer hardware unit that examines all memory references on the memory bus, translating these requests, known as virtual memory addresses, into physical addresses in main memory.

In computer operating systems, memory paging is a memory management scheme by which a computer stores and retrieves data from secondary storage for use in main memory. In this scheme, the operating system retrieves data from secondary storage in same-size blocks called pages. Paging is an important part of virtual memory implementations in modern operating systems, using secondary storage to let programs exceed the size of available physical memory.

tmpfs is a temporary file storage paradigm implemented in many Unix-like operating systems. It is intended to appear as a mounted file system, but data is stored in volatile memory instead of a persistent storage device. A similar construction is a RAM disk, which appears as a virtual disk drive and hosts a disk file system.

<span class="mw-page-title-main">Page table</span> Data structure that maps virtual addresses with physical addresses

A page table is a data structure used by a virtual memory system in a computer to store mappings between virtual addresses and physical addresses. Virtual addresses are used by the program executed by the accessing process, while physical addresses are used by the hardware, or more specifically, by the random-access memory (RAM) subsystem. The page table is a key component of virtual address translation that is necessary to access data in memory. The page table is set up by the computer's operating system, and may be read and written during the virtual address translation process by the memory management unit or by low-level system software or firmware.

<span class="mw-page-title-main">Diskless node</span> Computer workstation operated without disk drives

A diskless node is a workstation or personal computer without disk drives, which employs network booting to load its operating system from a server.

In computer science, a data buffer is a region of a memory used to store data temporarily while it is being moved from one place to another. Typically, the data is stored in a buffer as it is retrieved from an input device or just before it is sent to an output device. However, a buffer may be used when data is moved between processes within a computer. That is comparable to buffers in telecommunication. Buffers can be implemented in a fixed memory location in hardware or by using a virtual data buffer in software that points at a location in the physical memory.

In computer storage, fragmentation is a phenomenon in which storage space, main storage or secondary storage, is used inefficiently, reducing capacity or performance and often both. The exact consequences of fragmentation depend on the specific system of storage allocation in use and the particular form of fragmentation. In many cases, fragmentation leads to storage space being "wasted", and in that case the term also refers to the wasted space itself.

sync is a standard system call in the Unix operating system, which commits all data from the kernel filesystem buffers to non-volatile storage, i.e., data which has been scheduled for writing via low-level I/O system calls. Higher-level I/O layers such as stdio may maintain separate buffers of their own.

<span class="mw-page-title-main">ReadyBoost</span> Disk caching component of Microsoft Windows

ReadyBoost is a disk caching software component developed by Microsoft for Windows Vista and included in later versions of Windows. ReadyBoost enables NAND memory mass storage CompactFlash, SD card, and USB flash drive devices to be used as a cache between the hard drive and random access memory in an effort to increase computing performance. ReadyBoost relies on the SuperFetch and also adjusts its cache based on user activity. ReadyDrive for hybrid drives is implemented in a manner similar to ReadyBoost.

<span class="mw-page-title-main">Solid-state drive</span> Data storage device

A solid-state drive (SSD) is a solid-state storage device that uses integrated circuit assemblies to store data persistently, typically using flash memory, and functions as secondary storage in the hierarchy of computer storage. It is also sometimes called a semiconductor storage device, a solid-state device, or a solid-state disk, even though SSDs lack the physical spinning disks and movable read-write heads used in hard disk drives (HDDs) and floppy disks. SSD also has rich internal parallelism for data processing.

<span class="mw-page-title-main">Disk buffer</span>

In computer storage, disk buffer is the embedded memory in a hard disk drive (HDD) or solid state drive (SSD) acting as a buffer between the rest of the computer and the physical hard disk platter or flash memory that is used for storage. Modern hard disk drives come with 8 to 256 MiB of such memory, and solid-state drives come with up to 4 GB of cache memory.

<span class="mw-page-title-main">SpartaDOS X</span> Disk operating system

SpartaDOS X is a disk operating system for the Atari 8-bit family of computers that closely resembles MS-DOS. It was developed and sold by ICD, Inc. in 1987-1993, and many years later picked up by the third-party community SpartaDOS X Upgrade Project, which still maintains the software.

libtorrent

libtorrent is an open-source implementation of the BitTorrent protocol. It is written in and has its main library interface in C++. Its most notable features are support for Mainline DHT, IPv6, HTTP seeds and μTorrent's peer exchange. libtorrent uses Boost, specifically Boost.Asio to gain its platform independence. It is known to build on Windows and most Unix-like operating systems.

bcache is a cache in the Linux kernel's block layer, which is used for accessing secondary storage devices. It allows one or more fast storage devices, such as flash-based solid-state drives (SSDs), to act as a cache for one or more slower storage devices, such as hard disk drives (HDDs); this effectively creates hybrid volumes and provides performance improvements.

dm-cache is a component of the Linux kernel's device mapper, which is a framework for mapping block devices onto higher-level virtual block devices. It allows one or more fast storage devices, such as flash-based solid-state drives (SSDs), to act as a cache for one or more slower storage devices such as hard disk drives (HDDs); this effectively creates hybrid volumes and provides secondary storage performance improvements.

Virtual memory compression is a memory management technique that utilizes data compression to reduce the size or number of paging requests to and from the auxiliary storage. In a virtual memory compression system, pages to be paged out of virtual memory are compressed and stored in physical memory, which is usually random-access memory (RAM), or sent as compressed to auxiliary storage such as a hard disk drive (HDD) or solid-state drive (SSD). In both cases the virtual memory range, whose contents has been compressed, is marked inaccessible so that attempts to access compressed pages can trigger page faults and reversal of the process. The footprint of the data being paged is reduced by the compression process; in the first instance, the freed RAM is returned to the available physical memory pool, while the compressed portion is kept in RAM. In the second instance, the compressed data is sent to auxiliary storage but the resulting I/O operation is smaller and therefore takes less time.

References

  1. Robert Love (2005-01-12). "Linux Kernel Development (Second Edition), Chapter 15. The Page Cache and Page Writeback". makelinux.net. Sams Publishing. Retrieved 2015-07-24.
  2. "Disk Cache". Webopedia. September 1996.
  3. Mark Kyrnin. "What to Look for in a Hard Drive". about.com. Archived from the original on 2015-04-04. Retrieved 2014-12-20. A drive's buffer is an amount of RAM on the drive to store frequently accessed data from the drive.{{cite web}}: CS1 maint: unfit URL (link)
  4. "free(1) — procps — Debian bookworm — Debian Manpages".
  5. "Glossary - TechNet Library". Microsoft.
  6. Gruss, Daniel; Kraft, Erik; Tiwari, Trishita; Schwarz, Michael; Trachtenberg, Ari; Hennessey, Jason; Ionescu, Alex; Fogh, Anders (2019-01-04). "Page Cache Attacks". arXiv: 1901.01161 [cs.CR].