Second Level Address Translation

Last updated

Second Level Address Translation (SLAT), also known as nested paging, is a hardware-assisted virtualization technology which makes it possible to avoid the overhead associated with software-managed shadow page tables.

Contents

AMD has supported SLAT through the Rapid Virtualization Indexing (RVI) technology since the introduction of its third-generation Opteron processors (code name Barcelona). Intel's implementation of SLAT, known as Extended Page Table (EPT), was introduced in the Nehalem microarchitecture found in certain Core i7, Core i5, and Core i3 processors.

ARM's virtualization extensions support SLAT, known as Stage-2 page-tables provided by a Stage-2 MMU. The guest uses the Stage-1 MMU. Support was added as optional in the ARMv7ve architecture and is also supported in the ARMv8 (32-bit and 64-bit) architectures.

Overview

The introduction of protected mode to the x86 architecture with the Intel 80286 processor brought the concepts of physical memory and virtual memory to mainstream architectures. When processes use virtual addresses and an instruction requests access to memory, the processor translates the virtual address to a physical address using a page table or translation lookaside buffer (TLB). When running a virtual system, it has allocated virtual memory of the host system that serves as a physical memory for the guest system, and the same process of address translation goes on also within the guest system. This increases the cost of memory access since the address translation needs to be performed twice  once inside the guest system (using software-emulated guest page table), and once inside the host system (using physical map[pmap]).

In order to make this translation efficient, software engineers implemented software based shadow page table. Shadow page table will translate guest virtual memory directly to host physical memory address. Each VM has a separate shadow page table and hypervisor is in charge of managing them. But the cost is very expensive since every time a guest updates its page table, it will trigger the hypervisor to manage the allocation of the page table and its changes.

In order to make this translation more efficient, processor vendors implemented technologies commonly called SLAT. By treating each guest-physical address as a host-virtual address, a slight extension of the hardware used to walk a non-virtualized page table (now the guest page table) can walk the host page table. With multilevel page tables the host page table can be viewed conceptually as nested within the guest page table. A hardware page table walker can treat the additional translation layer almost like adding levels to the page table.

Using SLAT and multilevel page tables, the number of levels needed to be walked to find the translation doubles when the guest-physical address is the same size as the guest-virtual address and the same size pages are used. This increases the importance of caching values from intermediate levels of the host and guest page tables. It is also helpful to use large pages in the host page tables to reduce the number of levels (e.g., in x86-64, using 2  MB pages removes one level in the page table). Since memory is typically allocated to virtual machines at coarse granularity, using large pages for guest-physical translation is an obvious optimization, reducing the depth of look-ups and the memory required for host page tables.

Implementations

Rapid Virtualization Indexing

Rapid Virtualization Indexing (RVI), known as Nested Page Tables (NPT) during its development, is an AMD second generation hardware-assisted virtualization technology for the processor memory management unit (MMU). [1] [2] RVI was introduced in the third generation of Opteron processors, code name Barcelona. [3]

A VMware research paper found that RVI offers up to 42% gains in performance compared with software-only (shadow page table) implementation. [4] Tests conducted by Red Hat showed a doubling in performance for OLTP benchmarks. [5]

Extended Page Tables

Extended Page Tables (EPT) is an Intel second-generation x86 virtualization technology for the memory management unit (MMU). EPT support is found in Intel's Core i3, Core i5, Core i7 and Core i9 CPUs, among others. [6] It is also found in some newer VIA CPUs. EPT is required in order to launch a logical processor directly in real mode, a feature called "unrestricted guest" in Intel's jargon, and introduced in the Westmere microarchitecture. [7] [8]

According to a VMware evaluation paper, "EPT provides performance gains of up to 48% for MMU-intensive benchmarks and up to 600% for MMU-intensive microbenchmarks", although it can actually cause code to run slower than a software implementation in some corner cases. [9]

Stage-2 page-tables

Stage-2 page-table support is present in ARM processors that implement exception level 2 (EL2).

Extensions

Mode Based Execution Control

Mode Based Execution Control (MBEC) is an extension to x86 SLAT implementations first available in Intel Kaby Lake and AMD Zen+ CPUs (known on the latter as Guest Mode Execute Trap or GMET). [10] The extension extends the execute bit in the extended page table (guest page table) into 2 bits - one for user execute, and one for supervisor execute. [11]

MBE was introduced to speed up guest usermode unsigned code execution with kernelmode code integrity enforcement. Under this configuration, unsigned code pages can be marked as execute under usermode, but must be marked as no-execute under kernelmode. To maintain integrity by ensuring all guest kernelmode executable code are signed even when the guest kernel is compromised, the guest kernel does not have permission to modify the execute bit of any memory pages. Modification of the execute bit, or switching of the guest page table which contains the execute bit, is delegated to a higher privileged entity, in this case the host hypervisor. Without MBE, each entrance from unsigned usermode execution to signed kernelmode execution must be accompanied by a VM exit to the hypervisor to perform a switch to the kernelmode page table. On the reverse operation, an exit from signed kernelmode to unsigned usermode must be accompanied by a VM exit to perform another page table switch. VM exits significantly impact code execution performance. [12] [13] With MBE, the same page table can be shared between unsigned usermode code and signed kernelmode code, with two sets of execute permission depending on the execution context. VM exits are no longer necessary when execution context switches between unsigned usermode and signed kernel mode.

Support in software

Hypervisors that support SLAT include the following:

Some of the above hypervisors require SLAT in order to work at all (not just faster) as they do not implement a software shadow page table; the list is not fully updated to reflect that.

See also

Related Research Articles

In computing, a virtual machine (VM) is the virtualization or emulation of a computer system. Virtual machines are based on computer architectures and provide the functionality of a physical computer. Their implementations may involve specialized hardware, software, or a combination of the two. Virtual machines differ and are organized by their function, shown here:

<span class="mw-page-title-main">Xen</span> Type-1 hypervisor

Xen is a free and open-source type-1 hypervisor, providing services that allow multiple computer operating systems to execute on the same computer hardware concurrently. It was originally developed by the University of Cambridge Computer Laboratory and is now being developed by the Linux Foundation with support from Intel, Citrix, Arm Ltd, Huawei, AWS, Alibaba Cloud, AMD, Bitdefender and epam.

In the 80386 microprocessor and later, virtual 8086 mode allows the execution of real mode applications that are incapable of running directly in protected mode while the processor is running a protected mode operating system. It is a hardware virtualization technique that allowed multiple 8086 processors to be emulated by the 386 chip. It emerged from the painful experiences with the 80286 protected mode, which by itself was not suitable to run concurrent real-mode applications well. John Crawford developed the Virtual Mode bit at the register set, paving the way to this environment.

x86 virtualization is the use of hardware-assisted virtualization capabilities on an x86/x86-64 CPU.

A hypervisor, also known as a virtual machine monitor (VMM) or virtualizer, is a type of computer software, firmware or hardware that creates and runs virtual machines. A computer on which a hypervisor runs one or more virtual machines is called a host machine, and each virtual machine is called a guest machine. The hypervisor presents the guest operating systems with a virtual operating platform and manages the execution of the guest operating systems. Unlike an emulator, the guest executes most instructions on the native hardware. Multiple instances of a variety of operating systems may share the virtualized hardware resources: for example, Linux, Windows, and macOS instances can all run on a single physical x86 machine. This contrasts with operating-system–level virtualization, where all instances must share a single kernel, though the guest operating systems can differ in user space, such as different Linux distributions with the same kernel.

In computing, paravirtualization or para-virtualization is a virtualization technique that presents a software interface to the virtual machines which is similar, yet not identical, to the underlying hardware–software interface.

<span class="mw-page-title-main">QEMU</span> Free virtualization and emulation software

QEMU is a free and open-source emulator. It emulates a computer's processor through dynamic binary translation and provides a set of different hardware and device models for the machine, enabling it to run a variety of guest operating systems. It can interoperate with Kernel-based Virtual Machine (KVM) to run virtual machines at near-native speed. QEMU can also do emulation for user-level processes, allowing applications compiled for one architecture to run on another.

Platform virtualization software, specifically emulators and hypervisors, are software packages that emulate the whole physical computer machine, often providing multiple virtual machines on one physical platform. The table below compares basic information about platform virtualization hypervisors.

In the x86 architecture, the CPUID instruction is a processor supplementary instruction allowing software to discover details of the processor. It was introduced by Intel in 1993 with the launch of the Pentium and SL-enhanced 486 processors.

In computing, hardware-assisted virtualization is a platform virtualization approach that enables efficient full virtualization using help from hardware capabilities, primarily from the host processors. A full virtualization is used to emulate a complete hardware environment, or virtual machine, in which an unmodified guest operating system effectively executes in complete isolation. Hardware-assisted virtualization was added to x86 processors in 2005, 2006 and 2010 (respectively).

<span class="mw-page-title-main">Input–output memory management unit</span> Configuration in computing

In computing, an input–output memory management unit (IOMMU) is a memory management unit (MMU) connecting a direct-memory-access–capable (DMA-capable) I/O bus to the main memory. Like a traditional MMU, which translates CPU-visible virtual addresses to physical addresses, the IOMMU maps device-visible virtual addresses to physical addresses. Some units also provide memory protection from faulty or malicious devices.

The following is a timeline of virtualization development. In computing, virtualization is the use of a computer to simulate another computer. Through virtualization, a host simulates a guest by exposing virtual hardware devices, which may be done through software or by allowing access to a physical device connected to the machine.

<span class="mw-page-title-main">Full virtualization</span> Computing technique involving instances of an environment

In computer science, full virtualization (fv) is a modern virtualization technique developed in late 1990s. It is different from simulation and emulation. Virtualization employs techniques that can create instances of a virtual environment, as opposed to simulation, which models the environment; and emulation, which replicates the target environment with certain kinds of virtual environments called emulation environments for virtual machines. Full virtualization requires that every salient feature of the hardware be reflected into one of several virtual machines – including the full instruction set, input/output operations, interrupts, memory access, and whatever other elements are used by the software that runs on the bare machine, and that is intended to run in a virtual machine. In such an environment, any software capable of execution on the raw hardware can be run in the virtual machine and, in particular, any operating systems. The obvious test of full virtualization is whether an operating system intended for stand-alone use can successfully run inside a virtual machine.

<span class="mw-page-title-main">Kernel-based Virtual Machine</span> Virtualization module in the Linux kernel

Kernel-based Virtual Machine (KVM) is a free and open-source virtualization module in the Linux kernel that allows the kernel to function as a hypervisor. It was merged into the mainline Linux kernel in version 2.6.20, which was released on February 5, 2007. KVM requires a processor with hardware virtualization extensions, such as Intel VT or AMD-V. KVM has also been ported to other operating systems such as FreeBSD and illumos in the form of loadable kernel modules.

Hardware virtualization is the virtualization of computers as complete hardware platforms, certain logical abstractions of their componentry, or only the functionality required to run various operating systems. Virtualization hides the physical characteristics of a computing platform from the users, presenting instead an abstract computing platform. At its origins, the software that controlled virtualization was called a "control program", but the terms "hypervisor" or "virtual machine monitor" became preferred over time.

<span class="mw-page-title-main">Hyper-V</span> Native hypervisor by Microsoft

Microsoft Hyper-V, codenamed Viridian, and briefly known before its release as Windows Server Virtualization, is a native hypervisor; it can create virtual machines on x86-64 systems running Windows. Starting with Windows 8, Hyper-V superseded Windows Virtual PC as the hardware virtualization component of the client editions of Windows NT. A server computer running Hyper-V can be configured to expose individual virtual machines to one or more networks. Hyper-V was first released with Windows Server 2008, and has been available without additional charge since Windows Server 2012 and Windows 8. A standalone Windows Hyper-V Server is free, but has a command-line interface only. The last version of free Hyper-V Server is Hyper-V Server 2019, which is based on Windows Server 2019.

Live migration, also called migration, refers to the process of moving a running virtual machine (VM) or application between different physical machines without disconnecting the client or application. Memory, storage, and network connectivity of the virtual machine are transferred from the original guest machine to the destination. The time between stopping the VM or application on the source and resuming it on destination is called 'downtime'. When the downtime of a VM during live migration is small enough that it is not noticeable by the end user, it is called a 'seamless' live migration.

libvirt Management tool

libvirt is an open-source API, daemon and management tool for managing platform virtualization. It can be used to manage KVM, Xen, VMware ESXi, QEMU and other virtualization technologies. These APIs are widely used in the orchestration layer of hypervisors in the development of a cloud-based solution.

In computer security, virtual machine (VM) escape is the process of a program breaking out of the virtual machine on which it is running and interacting with the host operating system. A virtual machine is a "completely isolated guest operating system installation within a normal host operating system". In 2008, a vulnerability in VMware discovered by Core Security Technologies made VM escape possible on VMware Workstation 6.0.2 and 5.5.4. A fully working exploit labeled Cloudburst was developed by Immunity Inc. for Immunity CANVAS. Cloudburst was presented in Black Hat USA 2009.

GPU virtualization refers to technologies that allow the use of a GPU to accelerate graphics or GPGPU applications running on a virtual machine. GPU virtualization is used in various applications such as desktop virtualization, cloud gaming and computational science.

References

  1. "Rapid Virtualization Indexing with Windows Server 2008 R2 Hyper-V | The Virtualization Blog". Blogs.amd.com. 2009-03-23. Retrieved 2010-05-16.
  2. "AMD-V Nested Paging" (PDF). July 2008. Archived from the original (PDF) on 2012-09-05. Retrieved 2013-12-11.
  3. "VMware engineer praises AMD's Nested Page Tables". Searchservervirtualization.techtarget.com. 2008-07-21. Retrieved 2010-05-16.
  4. 1 2 "Performance Evaluation of AMD RVI Hardware Assist" (PDF). Retrieved 2010-05-16.
  5. "Red Hat Magazine | Red Hat Enterprise Linux 5.1 utilizes nested paging on AMD Barcelona Processor to improve performance of virtualized guests". Magazine.redhat.com. 2007-11-20. Retrieved 2010-05-16.
  6. "Intel Virtualization Technology List". Ark.intel.com. Retrieved 2014-02-17.
  7. "Intel added unrestricted guest mode on Westmere micro-architecture and later Intel CPUs, it uses EPT to translate guest physical address access to host physical address. With this mode, VMEnter without enable paging is allowed."
  8. "Intel 64 and IA-32 Architectures Developer's Manual, Vol. 3C" (PDF). Intel. Retrieved 13 December 2015. If the 'unrestricted guest' VM-execution control is 1, the 'enable EPT' VM-execution control must also be 1.
  9. Performance Evaluation of Intel EPT Hardware Assist
  10. Cunningham, Andrew (2021-08-27). "Why Windows 11 has such strict hardware requirements, according to Microsoft". Ars Technica. Retrieved 2024-03-18.
  11. Mulnix, David L. "Intel Xeon Processor Scalable Family Technical Overview". intel. Retrieved 3 September 2021.
  12. Analysis of the Attack Surface of Windows 10 Virtualization-based Security
  13. Arkley, Brent. "The potential performance Impact of Device Guard (HVCI)". Borec's Legacy meets Modern Device Management Blog. Retrieved 3 September 2021.
  14. "AMD-V Rapid Virtualization Indexing and Windows Server 2008 R2 Hyper-V Second Level Address Translation". Doing IT Virtual. Retrieved 2010-05-16.
  15. Bott, Ed (2011-12-08). "Does your PC have what it takes to run Windows 8's Hyper-V?". ZDNet. Retrieved 2014-02-17.
  16. "Support & Drivers" . Retrieved 13 December 2015.
  17. "Hypervisor | Apple Developer Documentation".
  18. "Kernel Newbies: Linux 2 6 26".
  19. Sheng Yang (2008-06-12). "Extending KVM with new Intel Virtualization technology" (PDF). linux-kvm.org. KVM Forum. Archived from the original (PDF) on 2014-03-27. Retrieved 2013-03-17.
  20. Inc, Parallels. "KB Parallels: What's new in Parallels Desktop 5 for Mac". kb.parallels.com. Retrieved 2016-04-12.{{cite web}}: |last= has generic name (help)
  21. "Changelog for VirtualBox 2.0". Archived from the original on 2014-10-22.
  22. liz. "VMware Workstation 14 Pro Release Notes". docs.vmware.com. Retrieved 2020-11-19.
  23. "Benchmarks: Xen 3.2.0 on AMD Quad-Core Opteron with RVI". 2008-06-15. Retrieved 2011-05-13.
  24. "Hardware Compatibility List (HCL)". Qubes OS. Retrieved 2020-01-06.
  25. Implementation of a BIOS emulation support for BHyVe: A BSD Hypervisor
  26. "21.7. FreeBSD as a Host with bhyve" . Retrieved 13 December 2015.
  27. Coming Soon to OpenBSD/amd64: A Native Hypervisor
  28. vmm(4) — virtual machine monitor
  29. ACRN Memory Management High-Level Design
  30. "Features/VT-d - QEMU". wiki.qemu.org. Retrieved 2023-11-12.
  31. "Hyper-V Enlightenments — QEMU documentation". www.qemu.org. Retrieved 2023-11-12.
  32. "Add Intel VT-d nested translation [LWN.net]". lwn.net. Retrieved 2023-11-12.
  33. "Intel Virtualisation: How VT-x, KVM and QEMU Work Together". Binary Debt. 2018-10-14. Retrieved 2023-11-12.
  34. "Features/KVMNestedVirtualizationTestsuite - QEMU". wiki.qemu.org. Retrieved 2023-11-12.