Kernel same-page merging

Last updated

In computing, kernel same-page merging (KSM), also known as kernel shared memory, memory merging, memory deduplication, and page deduplication is a kernel feature that makes it possible for a hypervisor system to share memory pages that have identical contents between multiple processes or virtualized guests. While not directly linked, Kernel-based Virtual Machine (KVM) can use KSM to merge memory pages occupied by virtual machines.

Contents

Deduplication

KSM performs memory deduplication by scanning through main memory for physical pages that have identical content, and identifies the virtual pages that are mapped to those physical pages. It leaves one page unchanged, and re-maps each duplicate page to point to the same physical page, after which it releases the extra physical pages for re-use. It also marks both virtual pages as "copy-on-write" (COW), so that kernel will automatically remap a virtual page back to having its own separate physical page as soon as any process begins to write to it. [1]

KSM was originally intended to run more virtual machines on one host by sharing memory between processes as well as virtual machines. An experimental implementation of KSM by Red Hat found that 52 virtual instances of Windows XP with 1 GB of memory, could run on a host computer that had only 16 GB of RAM. [2] Upon its implementation, users found KSM to also be useful for non-virtualized environments in which memory is at a premium. [3] [4]

KSM was merged into the Linux kernel mainline in version 2.6.32, which was released on December 3, 2009. [3] [2] To be effective, the operating system kernel must find identical memory pages held by different processes. The kernel also needs to predict whether the pages are likely to update infrequently enough that the merging would be an efficient use of processor resources. [4] A concern is that although memory usage is reduced, CPU usage is increased, thus negating potential increases in performance. [1]

Security risks

Security is also a concern:

See also

Related Research Articles

In computing, a "virtual machine" (VM) is the virtualization or emulation of a computer system. Virtual machines are based on computer architectures and provide the functionality of a physical computer. Their implementations may involve specialized hardware, software, or a combination of the two. Virtual machines differ and are organized by their function, shown here:

x86-64 Type of instruction set which is a 64-bit version of the x86 instruction set

x86-64 is a 64-bit version of the x86 instruction set, first released in 1999. It introduced two new modes of operation, 64-bit mode and compatibility mode, along with a new 4-level paging mode.

In computer operating systems, memory paging is a memory management scheme by which a computer stores and retrieves data from secondary storage for use in main memory. In this scheme, the operating system retrieves data from secondary storage in same-size blocks called pages. Paging is an important part of virtual memory implementations in modern operating systems, using secondary storage to let programs exceed the size of available physical memory.

Copy-on-write (COW), sometimes referred to as implicit sharing or shadowing, is a resource-management technique used in computer programming to efficiently implement a "duplicate" or "copy" operation on modifiable resources. If a resource is duplicated but not modified, it is not necessary to create a new resource; the resource can be shared between the copy and the original. Modifications must still create a copy, hence the technique: the copy operation is deferred until the first write. By sharing resources in this way, it is possible to significantly reduce the resource consumption of unmodified copies, while adding a small overhead to resource-modifying operations.

In computing, position-independent code (PIC) or position-independent executable (PIE) is a body of machine code that, being placed somewhere in the primary memory, executes properly regardless of its absolute address. PIC is commonly used for shared libraries, so that the same library code can be loaded at a location in each program's address space where it does not overlap with other memory in use by, for example, other shared libraries. PIC was also used on older computer systems that lacked an MMU, so that the operating system could keep applications away from each other even within the single address space of an MMU-less system.

Address space layout randomization (ASLR) is a computer security technique involved in preventing exploitation of memory corruption vulnerabilities. In order to prevent an attacker from reliably jumping to, for example, a particular exploited function in memory, ASLR randomly arranges the address space positions of key data areas of a process, including the base of the executable and the positions of the stack, heap and libraries.

A hypervisor is a type of computer software, firmware or hardware that creates and runs virtual machines. A computer on which a hypervisor runs one or more virtual machines is called a host machine, and each virtual machine is called a guest machine. The hypervisor presents the guest operating systems with a virtual operating platform and manages the execution of the guest operating systems. Unlike an emulator, the guest executes most instructions on the native hardware. Multiple instances of a variety of operating systems may share the virtualized hardware resources: for example, Linux, Windows, and macOS instances can all run on a single physical x86 machine. This contrasts with operating-system–level virtualization, where all instances must share a single kernel, though the guest operating systems can differ in user space, such as different Linux distributions with the same kernel.

PowerLinux is the combination of a Linux-based operating system (OS) running on PowerPC- or Power ISA-based computers from IBM. It is often used in reference along with Linux on Power, and is also the name of several Linux-only IBM Power Systems.

<span class="mw-page-title-main">Out of memory</span> State of computer operation where no additional memory can be allocated

Out of memory (OOM) is an often undesired state of computer operation where no additional memory can be allocated for use by programs or the operating system. Such a system will be unable to load any additional programs, and since many programs may load additional data into memory during execution, these will cease to function correctly. This usually occurs because all available memory, including disk swap space, has been allocated.

In computer security, executable-space protection marks memory regions as non-executable, such that an attempt to execute machine code in these regions will cause an exception. It makes use of hardware features such as the NX bit, or in some cases software emulation of those features. However, technologies that emulate or supply an NX bit will usually impose a measurable overhead while using a hardware-supplied NX bit imposes no measurable overhead.

The following is a timeline of virtualization development. In computing, virtualization is the use of a computer to simulate another computer. Through virtualization, a host simulates a guest by exposing virtual hardware devices, which may be done through software or by allowing access to a physical device connected to the machine.

<span class="mw-page-title-main">Kernel-based Virtual Machine</span> Virtualization module in the Linux kernel

Kernel-based Virtual Machine (KVM) is a free and open-source virtualization module in the Linux kernel that allows the kernel to function as a hypervisor. It was merged into the mainline Linux kernel in version 2.6.20, which was released on February 5, 2007. KVM requires a processor with hardware virtualization extensions, such as Intel VT or AMD-V. KVM has also been ported to other operating systems such as FreeBSD and illumos in the form of loadable kernel modules.

Hardware virtualization is the virtualization of computers as complete hardware platforms, certain logical abstractions of their componentry, or only the functionality required to run various operating systems. Virtualization hides the physical characteristics of a computing platform from the users, presenting instead an abstract computing platform. At its origins, the software that controlled virtualization was called a "control program", but the terms "hypervisor" or "virtual machine monitor" became preferred over time.

In computing, data deduplication is a technique for eliminating duplicate copies of repeating data. Successful implementation of the technique can improve storage utilization, which may in turn lower capital expenditure by reducing the overall amount of storage media required to meet storage capacity needs. It can also be applied to network data transfers to reduce the number of bytes that must be sent.

cgroups is a Linux kernel feature that limits, accounts for, and isolates the resource usage of a collection of processes.

Second Level Address Translation (SLAT), also known as nested paging, is a hardware-assisted virtualization technology which makes it possible to avoid the overhead associated with software-managed shadow page tables.

Linux on IBM Z or Linux on zSystems is the collective term for the Linux operating system compiled to run on IBM mainframes, especially IBM Z / IBM zSystems and IBM LinuxONE servers. Similar terms which imply the same meaning are Linux/390, Linux/390x, etc. The three Linux distributions certified for usage on the IBM Z hardware platform are Red Hat Enterprise Linux, SUSE Linux Enterprise Server, and Ubuntu.

In computing, a system virtual machine is a virtual machine (VM) that provides a complete system platform and supports the execution of a complete operating system (OS). These usually emulate an existing architecture, and are built with the purpose of either providing a platform to run programs where the real hardware is not available for use, or of having multiple instances of virtual machines leading to more efficient use of computing resources, both in terms of energy consumption and cost effectiveness, or both. A VM was originally defined by Popek and Goldberg as "an efficient, isolated duplicate of a real machine".

<span class="mw-page-title-main">Kernel page-table isolation</span>

Kernel page-table isolation is a Linux kernel feature that mitigates the Meltdown security vulnerability and improves kernel hardening against attempts to bypass kernel address space layout randomization (KASLR). It works by better isolating user space and kernel space memory. KPTI was merged into Linux kernel version 4.15, and backported to Linux kernels 4.14.11, 4.9.75, and 4.4.110. Windows and macOS released similar updates. KPTI does not address the related Spectre vulnerability.

<span class="mw-page-title-main">Meltdown (security vulnerability)</span> Microprocessor security vulnerability

Meltdown is one of the two original transient execution CPU vulnerabilities. Meltdown affects Intel x86 microprocessors, IBM POWER processors, and some ARM-based microprocessors. It allows a rogue process to read all memory, even when it is not authorized to do so.

References

  1. 1 2 "KSM tries again". lwn.net. kernel.org. Retrieved August 21, 2010.
  2. 1 2 "Linux kernel 2.6.32, Section 1.3. Kernel Samepage Merging (memory deduplication)". KernelNewbies.org. December 3, 2009. Retrieved August 12, 2015.
  3. 1 2 "Anatomy of Linux Kernel Shared Memory". IBM Developer. IBM. Archived from the original on July 9, 2016. Retrieved August 21, 2010.{{cite web}}: CS1 maint: unfit URL (link)
  4. 1 2 "Increasing memory density by using KSM" (PDF). Kernel.org. Retrieved August 21, 2010.
  5. Kuniyasu Suzaki; Kengo Iijima; Toshiki Yagi; Cyrille Artho. "Memory Deduplication as a Threat to the Guest OS" (PDF). Archived from the original (PDF) on January 5, 2013. Retrieved September 16, 2013.
  6. Barresi, Antonio; Razavi, Kaveh; Payer, Mathias; Gross, Thomas R. (August 2015). "CAIN: Silently Breaking ASLR in the Cloud" (PDF). USENIX . Retrieved August 12, 2015.
  7. Red Hat; Polyakov, Andy. "It's all a question of time - AES timing attacks on OpenSSL". access.redhat.com. Red Hat. Retrieved August 4, 2016.
  8. Schwarzl, Martin; Kraft, Erik; Lipp, Moritz; Gruss, Daniel (2021). "Remote Memory-Deduplication Attacks". Network and Distributed System Security Symposium. Retrieved April 2, 2022.
  9. "New FFS Rowhammer Attack Hijacks Linux VMs" . Retrieved August 17, 2016.