Virtual memory compression

Last updated

Virtual memory compression (also referred to as RAM compression and memory compression) is a memory management technique that utilizes data compression to reduce the size or number of paging requests to and from the auxiliary storage. [1] In a virtual memory compression system, pages to be paged out of virtual memory are compressed and stored in physical memory, which is usually random-access memory (RAM), or sent as compressed to auxiliary storage such as a hard disk drive (HDD) or solid-state drive (SSD). In both cases the virtual memory range, whose contents has been compressed, is marked inaccessible so that attempts to access compressed pages can trigger page faults and reversal of the process (retrieval from auxiliary storage and decompression). The footprint of the data being paged is reduced by the compression process; in the first instance, the freed RAM is returned to the available physical memory pool, while the compressed portion is kept in RAM. In the second instance, the compressed data is sent to auxiliary storage but the resulting I/O operation is smaller and therefore takes less time. [2] [3]

Contents

In some implementations, including zswap, zram and Helix Software Company’s Hurricane, the entire process is implemented in software. In other systems, such as IBM's MXT, the compression process occurs in a dedicated processor that handles transfers between a local cache and RAM.

Virtual memory compression is distinct from garbage collection (GC) systems, which remove unused memory blocks and in some cases consolidate used memory regions, reducing fragmentation and improving efficiency. Virtual memory compression is also distinct from context switching systems, such as Connectix's RAM Doubler (though it also did online compression) and Apple OS 7.1, in which inactive processes are suspended and then compressed as a whole. [4]

Types

There are two general types of virtual memory compression : (1) sending compressed pages to a swap file in main memory, possibly with a backing store in auxiliary storage, [1] [5] [6] and (2) storing compressed pages side-by-side with uncompressed pages. [1]

The first type (1) usually uses some sort of LZ class dictionary compression algorithm combined with entropy coding, such as LZO or LZ4, [6] [5] to compress the pages being swapped out. Once compressed, they are either stored in a swap file in main memory, or written to auxiliary storage, such as a hard disk. [6] [5] A two stage process can be used instead wherein there exists both a backing store in auxiliary storage and a swap file in main memory and pages that are evicted from the in-memory swap file are written to the backing store with a greatly decreased bandwidth need due to the compression. This last scheme leverages the benefits of both previous methods : fast in-memory data access with a great increase in the total amount of data that can be swapped out and a decreased bandwidth requirement for data written to auxiliary storage. [6] [5] [1]

One of the most used class of algorithms for the second type (2), the WK class of compression algorithms, takes advantage of in-memory data regularities present in pointers and integers. [1] Specifically, in target code generated by most high-level programming languages, both integers and pointers are often present in records whose elements are word-aligned. Furthermore, the values stored in integers are usually small. Also pointers tend to point to nearby locations. Additionally, common data patterns such as a word of all zeroes can be encoded in the compressed output by a very small code. Using these data regularities, the WK class of algorithms use a very small dictionary ( 16 entries in the case of WKdm ) to achieve up to a 2:1 compression ratio while achieving much greater speeds and having less overhead than LZ class dictionary compression schemes. [1]

Benefits

By reducing the I/O activity caused by paging requests, virtual memory compression can produce overall performance improvements. The degree of performance improvement depends on a variety of factors, including the availability of any compression co-processors, spare bandwidth on the CPU, speed of the I/O channel, speed of the physical memory, and the compressibility of the physical memory contents.

On multi-core, multithreaded CPUs, some benchmarks show performance improvements of over 50%. [7] [8]

In some situations, such as in embedded devices, auxiliary storage is limited or non-existent. In these cases, virtual memory compression can allow a virtual memory system to operate, where otherwise virtual memory would have to be disabled. This allows the system to run certain software which would otherwise be unable to operate in an environment with no virtual memory. [9]

Shortcomings

Low compression ratios

One of the primary issues is the degree to which the contents of physical memory can be compressed under real-world loads. Program code and much of the data held in physical memory is often not highly compressible, since efficient programming techniques and data architectures are designed to automatically eliminate redundancy in data sets. Various studies show typical data compression ratios ranging from 2:1 to 2.5:1 for program data, [10] [11] similar to typically achievable compression ratios with disk compression. [9]

Background I/O

In order for virtual memory compression to provide measurable performance improvements, the throughput of the virtual memory system must be improved when compared to the uncompressed equivalent. Thus, the additional amount of processing introduced by the compression must not increase the overall latency. However, in I/O-bound systems or applications with highly compressible data sets, the gains can be substantial. [9]

Increased thrashing

The physical memory used by a compression system reduces the amount of physical memory available to processes that a system runs, which may result in increased paging activity and reduced overall effectiveness of virtual memory compression. This relationship between the paging activity and available physical memory is roughly exponential, meaning that reducing the amount of physical memory available to system processes results in an exponential increase of paging activity. [12] [13]

In circumstances where the amount of free physical memory is low and paging is fairly prevalent, any performance gains provided by the compression system (compared to paging directly to and from auxiliary storage) may be offset by an increased page fault rate that leads to thrashing and degraded system performance. In an opposite state, where enough physical memory is available and paging activity is low, compression may not impact performance enough to be noticeable. The middle ground between these two circumstanceslow RAM with high paging activity, and plenty of RAM with low paging activityis where virtual memory compression may be most useful. However, the more compressible the program data is, the more pronounced are the performance improvements as less physical memory is needed to hold the compressed data.

For example, in order to maximize the use of a compressed pages cache, Helix Software Company's Hurricane 2.0 provides a user-configurable compression rejection threshold. By compressing the first 256 to 512 bytes of a 4 KiB page, this virtual memory compression system determines whether the configured compression level threshold can be achieved for a particular page; if achievable, the rest of the page would be compressed and retained in a compressed cache, and otherwise the page would be sent to auxiliary storage through the normal paging system. The default setting for this threshold is an 8:1 compression ratio. [14] [4]

CPU utilization overhead

In hardware implementations, the technology also relies on price differentials between the various components of the system, for example, the difference between the cost of RAM and the cost of a processor dedicated to compression. The relative price-to-performance differences of the various components tend to vary over time. For example, the addition of a compression co-processor may have minimal impact on the cost of a CPU.

Prioritization

In a typical virtual memory implementation, paging happens on a least recently used basis, potentially causing the compression algorithm to use up CPU cycles dealing with the lowest priority data. Furthermore, program code is usually read-only, and is therefore never paged-out. Instead code is simply discarded, and re-loaded from the program’s auxiliary storage file if needed. In this case the bar for compression is higher, since the I/O cycle it is attempting to eliminate is much shorter, particularly on flash memory devices.

Compression using quantization

Accelerator designers exploit quantization to reduce the bitwidth of values and reduce the cost of data movement. However, any value that does not fit in the reduced bitwidth results in an overflow (we refer to these values as outliers). Therefore accelerators use quantization for applications that are tolerant to overflows. In most applications the rate of outliers is low and values are often within a narrow range [15] providing the opportunity to exploit quantization in generalpurpose processors. However, a software implementation of quantization in general-purpose processors has three problems. First, the programmer has to manually implement conversions and the additional instructions that quantize and dequantize values, imposing a programmer's effort and performance overhead. Second, to cover outliers, the bitwidth of the quantized values often become greater than or equal to the original values. Third, the programmer has to use standard bitwidth; otherwise, extracting non-standard bitwidth (i.e., 1–7, 9–15, and 17–31) for representing narrow integers exacerbates the overhead of software-based quantization. A hardware support in the memory hierarchy of general-purpose processors for quantization can solve these problems. The hardware support allows representing values by few and flexible numbers of bits and storing outliers in their original format in a separate space, preventing any overflow. It minimizes metadata and the overhead of locating quantized values using a software-hardware interaction that transfers quantization parameters and data layout to hardware. As a result, transparent hardware-based quantization has three advantages over cache compression techniques: (i) less metadata, (ii) higher compression ratio for floating-point values and cache blocks with multiple data types, and (iii) lower overhead for locating the compressed blocks. [15]

History

Virtual memory compression has gone in and out of favor as a technology. The price and speed of RAM and external storage have plummeted due to Moore's Law and improved RAM interfaces such as DDR3, thus reducing the need for virtual memory compression, while multi-core processors, server farms, and mobile technology together with the advent of flash based systems make virtual memory compression more attractive.

Origins

Acorn Computers' Unix variant, RISC iX, was supplied as the primary operating system for its R140 workstation released in 1989. [16] RISC iX provided support for demand paging of compressed executable files. However, the principal motivation for providing compressed executable files was to accommodate a complete Unix system in a hard disk of relatively modest size. Compressed data was not paged out to disk under this scheme. [17] [18]

Paul R. Wilson proposed compressed caching of virtual memory pages in 1990, in a paper circulated at the ACM OOPSLA/ECOOP '90 Workshop on Garbage Collection ("Some Issues and Strategies in Heap Management and Memory Hierarchies"), and appearing in ACM SIGPLAN Notices in January 1991. [19]

Helix Software Company pioneered virtual memory compression in 1992, filing a patent application for the process in October of that year. [2] In 1994 and 1995, Helix refined the process using test-compression and secondary memory caches on video cards and other devices. [3] However, Helix did not release a product incorporating virtual memory compression until July 1996 and the release of Hurricane 2.0, which used the Stac Electronics Lempel–Ziv–Stac compression algorithm and also used off-screen video RAM as a compression buffer to gain performance benefits. [14]

In 1995, RAM cost nearly $50 per megabyte, and Microsoft's Windows 95 listed a minimum requirement of 4 MB of RAM. [20] Due to the high RAM requirement, several programs were released which claimed to use compression technology to gain “memory”. Most notorious was the SoftRAM program from Syncronys Softcorp. SoftRAM was exposed as fake because it did not perform any compression at all. [21] [9] Other products, including Hurricane and MagnaRAM, included virtual memory compression, but implemented only run-length encoding, with poor results, giving the technology a negative reputation. [22]

In its 8 April 1997 issue, PC Magazine published a comprehensive test of the performance enhancement claims of several software virtual memory compression tools. In its testing PC Magazine found a minimal (5% overall) performance improvement from the use of Hurricane, and none at all from any of the other packages. [22] However the tests were run on Intel Pentium systems which had a single core and were single threaded, and thus compression directly impacted all system activity.

In 1996, IBM began experimenting with compression, and in 2000 IBM announced its Memory eXpansion Technology (MXT). [23] [24] MXT was a stand-alone chip which acted as a CPU cache between the CPU and memory controller. MXT had an integrated compression engine which compressed all data heading to/from physical memory. Subsequent testing of the technology by Intel showed 520% overall system performance improvement, similar to the results obtained by PC Magazine with Hurricane. [25]

Recent developments

See also

Related Research Articles

<span class="mw-page-title-main">Computer data storage</span> Storage of digital data readable by computers

Computer data storage is a technology consisting of computer components and recording media that are used to retain digital data. It is a core function and fundamental component of computers.

<span class="mw-page-title-main">Virtual memory</span> Computer memory management technique

In computing, virtual memory, or virtual storage, is a memory management technique that provides an "idealized abstraction of the storage resources that are actually available on a given machine" which "creates the illusion to users of a very large (main) memory".

Direct memory access (DMA) is a feature of computer systems that allows certain hardware subsystems to access main system memory independently of the central processing unit (CPU).

In computing, Deflate is a lossless data compression file format that uses a combination of LZ77 and Huffman coding. It was designed by Phil Katz, for version 2 of his PKZIP archiving tool. Deflate was later specified in RFC 1951 (1996).

<span class="mw-page-title-main">Memory hierarchy</span> Computer memory architecture

In computer organisation, the memory hierarchy separates computer storage into a hierarchy based on response time. Since response time, complexity, and capacity are related, the levels may also be distinguished by their performance and controlling technologies. Memory hierarchy affects performance in computer architectural design, algorithm predictions, and lower level programming constructs involving locality of reference.

In computer science, algorithmic efficiency is a property of an algorithm which relates to the amount of computational resources used by the algorithm. An algorithm must be analyzed to determine its resource usage, and the efficiency of an algorithm can be measured based on the usage of different resources. Algorithmic efficiency can be thought of as analogous to engineering productivity for a repeating or continuous process.

<span class="mw-page-title-main">Memory management unit</span> Hardware translating virtual addresses to physical address

A memory management unit (MMU), sometimes called paged memory management unit (PMMU), is a computer hardware unit that examines all memory references on the memory bus, translating these requests, known as virtual memory addresses, into physical addresses in main memory.

In computer operating systems, memory paging is a memory management scheme by which a computer stores and retrieves data from secondary storage for use in main memory. In this scheme, the operating system retrieves data from secondary storage in same-size blocks called pages. Paging is an important part of virtual memory implementations in modern operating systems, using secondary storage to let programs exceed the size of available physical memory.

A translation lookaside buffer (TLB) is a memory cache that stores the recent translations of virtual memory to physical memory. It is used to reduce the time taken to access a user memory location. It can be called an address-translation cache. It is a part of the chip's memory-management unit (MMU). A TLB may reside between the CPU and the CPU cache, between CPU cache and the main memory or between the different levels of the multi-level cache. The majority of desktop, laptop, and server processors include one or more TLBs in the memory-management hardware, and it is nearly always present in any processor that utilizes paged or segmented virtual memory.

In computer science, thrashing occurs in a system with virtual memory when a computer's real storage resources are overcommitted, leading to a constant state of paging and page faults, slowing most application-level processing. This causes the performance of the computer to degrade or collapse. The situation can continue indefinitely until the user closes some running applications or the active processes free up additional virtual memory resources.

<span class="mw-page-title-main">Page table</span> Data structure that maps virtual addresses with physical addresses

A page table is a data structure used by a virtual memory system in a computer operating system to store mappings between virtual addresses and physical addresses. Virtual addresses are used by the program executed by the accessing process, while physical addresses are used by the hardware, or more specifically, by the random-access memory (RAM) subsystem. The page table is a key component of virtual address translation that is necessary to access data in memory.

A CPU cache is a hardware cache used by the central processing unit (CPU) of a computer to reduce the average cost to access data from the main memory. A cache is a smaller, faster memory, located closer to a processor core, which stores copies of the data from frequently used main memory locations. Most CPUs have a hierarchy of multiple cache levels, with different instruction-specific and data-specific caches at level 1. The cache memory is typically implemented with static random-access memory (SRAM), in modern CPUs by far the largest part of them by chip area, but SRAM is not always used for all levels, or even any level, sometimes some latter or all levels are implemented with eDRAM.

A disk compression software utility increases the amount of information that can be stored on a hard disk drive of given size. Unlike a file compression utility, which compresses only specified files—and which requires the user to designate the files to be compressed—an on-the-fly disk compression utility works automatically through resident software without the user needing to be aware of its existence. On-the-fly disk compression is therefore also known as transparent, real-time or online disk compression.

The IBM SAN Volume Controller (SVC) is a block storage virtualization appliance that belongs to the IBM System Storage product family. SVC implements an indirection, or "virtualization", layer in a Fibre Channel storage area network (SAN).

In computer science, memory virtualization decouples volatile random access memory (RAM) resources from individual systems in the data centre, and then aggregates those resources into a virtualized memory pool available to any computer in the cluster. The memory pool is accessed by the operating system or applications running on top of the operating system. The distributed memory pool can then be utilized as a high-speed cache, a messaging layer, or a large, shared memory resource for a CPU or a GPU application.

zram, formerly called compcache, is a Linux kernel module for creating a compressed block device in RAM, i.e. a RAM disk with on-the-fly disk compression. The block device created with zram can then be used for swap or as general-purpose RAM disk. The two most common uses for zram are for the storage of temporary files and as a swap device. Initially, zram had only the latter function, hence the original name "compcache".

<span class="mw-page-title-main">IBM FlashSystem</span> IBM Storage enterprise system that store data on flash memory

IBM FlashSystem is an IBM Storage enterprise system that stores data on flash memory. Unlike storage systems that use standard solid-state drives, IBM FlashSystem products incorporate custom hardware based on technology from the 2012 IBM acquisition of Texas Memory Systems.

zswap is a Linux kernel feature that provides a compressed write-back cache for swapped pages, as a form of virtual memory compression. Instead of moving memory pages to a swap device when they are to be swapped out, zswap performs their compression and then stores them into a memory pool dynamically allocated in the system RAM. Later writeback to the actual swap device is deferred or even completely avoided, resulting in a significantly reduced I/O for Linux systems that require swapping; the tradeoff is the need for additional CPU cycles to perform the compression.

IBM FlashCore Modules (FCM) are solid state technology computer data storage modules using PCI Express attachment and the NVMe command set. They are offered as an alternative to industry-standard 2.5" NVMe SSDs in selected arrays from the IBM FlashSystem family, with raw storage capacities of 4.8 TB, 9.6 TB, 19.2 TB and 38.4 TB. FlashCore modules support hardware self-encryption and real-time inline hardware data compression up to 115.2 TB address space, without performance impact.

842, 8-4-2, or EFT is a data compression algorithm. It is a variation on Lempel–Ziv compression with a limited dictionary length. With typical data, 842 gives 80 to 90 percent of the compression of LZ77 with much faster throughput and less memory use. Hardware implementations also provide minimal use of energy and minimal chip area.

References

  1. 1 2 3 4 5 6 Wilson, Paul R.; Kaplan, Scott F.; Smaragdakis, Yannis (1999-06-06). The Case for Compressed Caching in Virtual Memory Systems (PDF). USENIX Annual Technical Conference. Monterey, California, USA. pp. 101–116.
  2. 1 2 US 5559978,Spilo, Michael L.,"Method for increasing the efficiency of a virtual memory system by selective compression of RAM memory contents",published 1996-09-24, assigned to Helix Software Co., Inc.
  3. 1 2 US 5875474,Fabrizio, Daniel&Spilo, Michael L.,"Method for caching virtual memory paging and disk input/output requests using off screen video memory",published 1999-02-23, assigned to Helix Software Co., Inc.
  4. 1 2 "Mac Memory Booster Gets an Upgrade". Computerworld . IDG Enterprise. 30 (37): 56. 1996-09-09. ISSN   0010-4841 . Retrieved 2015-01-12.
  5. 1 2 3 4 "Gupta", "Nitin". ""zram: Compressed RAM-based block devices"". docs.kernel.org. "The kernel development community". Retrieved 2023-12-29.
  6. 1 2 3 4 ""zswap"". www.kernel.org. "The kernel development community". Retrieved 2023-12-29.
  7. Jennings, Seth. "Transparent Memory Compression in Linux" (PDF). linuxfoundation.org. Archived from the original (PDF) on 2015-01-04. Retrieved 2015-01-01.
  8. "Performance numbers for compcache" . Retrieved 2015-01-01.
  9. 1 2 3 4 Paul, Matthias R. (1997-07-30) [1996-04-14]. "Kapitel II.18. Mit STACKER Hauptspeicher 'virtuell' verdoppeln…" [Utilizing STACKER to 'virtually' double main memory…]. NWDOS-TIPs Tips & Tricks rund um Novell DOS 7, mit Blick auf undokumentierte Details, Bugs und Workarounds [NWDOS-TIPs Tips & tricks for Novell DOS 7, with a focus on undocumented details, bugs and workarounds]. Release 157 (in German) (3 ed.). Archived from the original on 2016-11-05. Retrieved 2012-01-11.
  10. Simpson, Matthew (2014). "Analysis of Compression Algorithms for Program Data" (PDF). p. 6. Retrieved 2015-01-09.
  11. Rizzo, Luigi (1996). "A very fast algorithm for RAM compression". ACM SIGOPS Operating Systems Review. 31 (2): 8. doi:10.1145/250007.250012. S2CID   18563587 . Retrieved 2015-01-09.
  12. Denning, Peter J. (1968). "Thrashing: Its causes and prevention" (PDF). Proceedings AFIPS, Fall Joint Computer Conference. 33: 918. Retrieved 2015-01-05.
  13. Freedman, Michael J. (2000-03-16). "The Compression Cache: Virtual Memory Compression for Handheld Computers" (PDF). Retrieved 2015-01-09.
  14. 1 2 "Hurricane 2.0 Squeezes the Most Memory from Your System". PC Magazine . 1996-10-08. Retrieved 2015-01-01.
  15. 1 2 Lenjani, Marzieh (2019-11-03). "An Overflow-free Quantized Memory Hierarchy in General-purpose Processors" (PDF). In Proceedings of the IEEE International Symposium on Workload Characterization. Retrieved 2020-03-16.
  16. Cox, James (December 1989). "Power to the People". Acorn User. pp. 66–67, 69, 71. Retrieved 2020-09-06.
  17. Taunton, Mark (1991). "Compressed Executables: An Exercise in Thinking Small". Proceedings of the Summer 1991 USENIX Conference, Nashville, TN, USA, June 1991. USENIX Association: 385–404.
  18. Taunton, Mark (1991-01-22). "Compressed executables". Newsgroup:  comp.unix.internals. Usenet:   4743@acorn.co.uk . Retrieved 2020-10-10.
  19. Wilson, Paul R. (1991). "Some Issues and Strategies in Heap Management and Memory Hierarchies". ACM SIGPLAN Notices. 26 (3): 45–52. doi:10.1145/122167.122173. S2CID   15404854.
  20. "Windows 95 Installation Requirements". Microsoft . Retrieved 2015-01-01.
  21. "SoftRAM Under Scruitny". PC Magazine . 1996-01-23. Retrieved 2015-01-01.
  22. 1 2 "Performance Enhancers". PC Magazine . 1997-04-08. Retrieved 2015-01-01.
  23. "IBM Research Breakthrough Doubles Computer Memory Capacity". IBM. 2000-06-26. Retrieved 2015-01-01.
  24. "Memory eXpansion Technologies". IBM . Retrieved 2015-01-01.
  25. Kant, Krishna (2003-02-01). "An Evaluation of Memory Compression Alternatives". Intel Corporation . Retrieved 2015-01-01.
  26. "CompCache". Google code. Retrieved 2015-01-01.
  27. "AIX 6.1 Active Memory Expansion". IBM. Archived from the original on 2015-01-04. Retrieved 2015-01-01.
  28. "IBM Power Systems Hardware Deep Dive" (PDF). IBM. Archived from the original (PDF) on 2015-01-04. Retrieved 2015-01-01.
  29. "OS X 10.9 Mavericks: The Ars Technica Review". 2013-10-22.
  30. "The Case for Compressed Caching in Virtual Memory Systems".
  31. Aul, Gabe (2015-08-18). "Announcing Windows 10 Insider Preview Build 10525". Blogging Windows. Microsoft . Retrieved 2015-08-19.