Task state segment

Last updated

The task state segment (TSS) is a structure on x86-based computers which holds information about a task. It is used by the operating system kernel for task management. Specifically, the following information is stored in the TSS:

Contents

All this information should be stored at specific locations within the TSS as specified in the IA-32 manuals.

Location of the TSS

The TSS may reside anywhere in memory. A segment register called the task register (TR) holds a segment selector that points to a valid TSS segment descriptor which resides in the GDT (a TSS descriptor may not reside in the LDT). Therefore, to use a TSS the following must be done by the operating system kernel:

  1. Create a TSS descriptor entry in the GDT
  2. Load the TR with the segment selector for that segment
  3. Add information to the TSS in memory as needed

For security purposes, the TSS should be placed in memory that is accessible only to the kernel.

Task register

The TR register is a 16-bit register which holds a segment selector for the TSS. It may be loaded through the LTR instruction. LTR is a privileged instruction and acts in a manner similar to other segment register loads. The task register has two parts: a portion visible and accessible by the programmer and an invisible one that is automatically loaded from the TSS descriptor.

Register states

The TSS may contain saved values of all the x86 registers. This is used for task switching. The operating system may load the TSS with the values of the registers that the new task needs and after executing a hardware task switch (such as with an IRET instruction) the x86 CPU will load the saved values from the TSS into the appropriate registers. Note that some modern operating systems such as Windows and Linux [1] do not use these fields in the TSS as they implement software task switching.

Note that during a hardware task switch, certain fields of the old TSS are updated with the CPU's current register contents before the values from the new TSS are read. Thus some TSS fields are read/write, while others are read-only:

The PDBR field is in fact the very first one read out of the new TSS: since a hardware task switch can also switch to a completely different page table mapping, all the other fields (especially the LDTR) are relative to the new mapping.

I/O port permissions

The TSS contains a 16-bit pointer to I/O port permissions bitmap for the current task. This bitmap, usually set up by the operating system when a task is started, specifies individual ports to which the program should have access. The I/O bitmap is a bit array of port access permissions; if the program has permission to access a port, a "0" is stored at the corresponding bit index, and if the program does not have permission, a "1" is stored there. If the TSS’ segment limit is less than the full bitmap, all missing bits are assumed to be "1".

The feature operates as follows: when a program issues an x86 I/O port instruction such as IN or OUT (see x86 instruction listings - and note that there are byte-, word- and dword-length versions), the hardware will do an I/O privilege level (IOPL) check to see if the program has access to all I/O ports. If the Current Privilege Level (CPL) of the program is numerically greater than the I/O Privilege level (IOPL) (the program is less-privileged than what the IOPL specifies), the program does not have I/O port access to all ports. The hardware will then check the I/O permissions bitmap in the TSS to see if that program can access the specific port(s) in the IN or OUT instruction. If (all the) relevant bit(s) in the I/O port permissions bitmap is/are clear, the program is allowed access to the port(s), and the instruction is allowed to execute. If (any of) the relevant bit(s) is/are set - or if (any of) the bit(s) is/are past the TSS’ segment limit - the program does not have access and the processor generates a general protection fault. This feature allows operating systems to grant selective port access to user programs.

Inner-level stack pointers

The TSS contains 6 fields for specifying the new stack pointer when a privilege level change happens. The field SS0 contains the stack segment selector for CPL=0, and the field ESP0/RSP0 contains the new ESP/RSP value for CPL=0. When an interrupt happens in protected (32-bit) mode, the x86 CPU will look in the TSS for SS0 and ESP0 and load their values into SS and ESP respectively. This allows for the kernel to use a different stack than the user program, and also have this stack be unique for each user program.

A new feature introduced in the AMD64 extensions is called the Interrupt Stack Table (IST), which also resides in the TSS and contains logical (segment+offset) stack pointers. If an interrupt descriptor table specifies an IST entry to use (there are 7), the processor will load the new stack from the IST instead. This allows known-good stacks to be used in case of serious errors (NMI or Double fault for example). Previously, the entry for the exception or interrupt in the IDT pointed to a task gate, causing the processor to switch to the task that is pointed by the task gate. The original register values were saved in the TSS current at the time the interrupt or exception occurred. The processor then set the registers, including SS:ESP, to a known value specified in the TSS and saved the selector to the previous TSS. The problem here is that hardware task switching is not supported on AMD64.

This is a 16-bit selector which allows linking this TSS with the previous one. This is only used for hardware task switching. See the IA-32 manuals for details.

Use of TSS in Linux

Although a TSS could be created for each task running on the computer, Linux kernel only creates one TSS for each CPU and uses them for all tasks. This approach was selected as it provides easier portability to other architectures (for example, the AMD64 architecture does not support hardware task switches), and improved performance and flexibility. Linux only uses the I/O port permission bitmap and inner stack features of the TSS; the other features are only needed for hardware task switches, which the Linux kernel does not use. [2]

The x86 exception vector 10 is called the Invalid TSS exception (#TS). It is issued by the processor whenever something goes wrong with the TSS access. For example, if an interrupt happens in CPL=3 and is transferring control to CPL=0, the TSS is used to extract SS0 and ESP0/RSP0 for the stack switch. If the task register holds a bad TSS selector, a #TS fault will be generated. The Invalid TSS exception should never happen during normal operating system operation and is always related to kernel bugs or hardware failure.

For more details on TSS exceptions, see Volume 3a, Chapter 6 of the IA-32 manual. [3]

TSS in x86-64 mode

The x86-64 architecture does not support hardware task switches. However the TSS can still be used in a machine running in the 64 bit extended modes. In these modes the TSS is still useful as it stores:

  1. The stack pointer addresses for each privilege level.
  2. Pointer Addresses for the Interrupt Stack Table (The inner-level stack pointer section above, discusses the need for this).
  3. Offset Address of the IO permission bitmap.

Also, the task register is expanded in these modes to be able to hold a 64-bit base address.

Related Research Articles

In computing, a context switch is the process of storing the state of a process or thread, so that it can be restored and resume execution at a later point, and then restoring a different, previously saved, state. This allows multiple processes to share a single central processing unit (CPU), and is an essential feature of a multitasking operating system.

<span class="mw-page-title-main">Operating system</span> Software that manages computer hardware resources

An operating system (OS) is system software that manages computer hardware, software resources, and provides common services for computer programs.

x86 Family of instruction set architectures

x86 is a family of complex instruction set computer (CISC) instruction set architectures initially developed by Intel based on the Intel 8086 microprocessor and its 8088 variant. The 8086 was introduced in 1978 as a fully 16-bit extension of Intel's 8-bit 8080 microprocessor, with memory segmentation as a solution for addressing more memory than can be covered by a plain 16-bit address. The term "x86" came into being because the names of several successors to Intel's 8086 processor end in "86", including the 80186, 80286, 80386 and 80486 processors.

<span class="mw-page-title-main">System call</span> Way for programs to access kernel services

In computing, a system call is the programmatic way in which a computer program requests a service from the operating system on which it is executed. This may include hardware-related services, creation and execution of new processes, and communication with integral kernel services such as process scheduling. System calls provide an essential interface between a process and the operating system.

x86 memory segmentation refers to the implementation of memory segmentation in the Intel x86 computer instruction set architecture. Segmentation was introduced on the Intel 8086 in 1978 as a way to allow programs to address more than 64 KB (65,536 bytes) of memory. The Intel 80286 introduced a second version of segmentation in 1982 that added support for virtual memory and memory protection. At this point the original mode was renamed to real mode, and the new version was named protected mode. The x86-64 architecture, introduced in 2003, has largely dropped support for segmentation in 64-bit mode.

In computing, protected mode, also called protected virtual address mode, is an operational mode of x86-compatible central processing units (CPUs). It allows system software to use features such as virtual memory, paging and safe multi-tasking designed to increase an operating system's control over application software.

<span class="mw-page-title-main">Memory management unit</span> Hardware translating virtual addresses to physical address

A memory management unit (MMU), sometimes called paged memory management unit (PMMU), is a computer hardware unit having all memory references passed through itself, primarily performing the translation of virtual memory addresses to physical addresses.

x86 assembly language is the name for the family of assembly languages which provide some level of backward compatibility with CPUs back to the Intel 8008 microprocessor, which was launched in April 1972. It is used to produce object code for the x86 class of processors.

Memory-mapped I/O (MMIO) and port-mapped I/O (PMIO) are two complementary methods of performing input/output (I/O) between the central processing unit (CPU) and peripheral devices in a computer. An alternative approach is using dedicated I/O processors, commonly known as channels on mainframe computers, which execute their own instructions.

Memory protection is a way to control memory access rights on a computer, and is a part of most modern instruction set architectures and operating systems. The main purpose of memory protection is to prevent a process from accessing memory that has not been allocated to it. This prevents a bug or malware within a process from affecting other processes, or the operating system itself. Protection may encompass all accesses to a specified area of memory, write accesses, or attempts to execute the contents of the area. An attempt to access unauthorized memory results in a hardware fault, e.g., a segmentation fault, storage violation exception, generally causing abnormal termination of the offending process. Memory protection for computer security includes additional techniques such as address space layout randomization and executable space protection.

<span class="mw-page-title-main">General protection fault</span>

A general protection fault (GPF) in the x86 instruction set architectures (ISAs) is a fault initiated by ISA-defined protection mechanisms in response to an access violation caused by some running code, either in the kernel or a user program. The mechanism is first described in Intel manuals and datasheets for the Intel 80286 CPU, which was introduced in 1983; it is also described in section 9.8.13 in the Intel 80386 programmer's reference manual from 1986. A general protection fault is implemented as an interrupt. Some operating systems may also classify some exceptions not related to access violations, such as illegal opcode exceptions, as general protection faults, even though they have nothing to do with memory protection. If a CPU detects a protection violation, it stops executing the code and sends a GPF interrupt. In most cases, the operating system removes the failing process from the execution queue, signals the user, and continues executing other processes. If, however, the operating system fails to catch the general protection fault, i.e. another protection violation occurs before the operating system returns from the previous GPF interrupt, the CPU signals a double fault, stopping the operating system. If yet another failure occurs, the CPU is unable to recover; since 80286, the CPU enters a special halt state called "Shutdown", which can only be exited through a hardware reset. The IBM PC AT, the first PC-compatible system to contain an 80286, has hardware that detects the Shutdown state and automatically resets the CPU when it occurs. All descendants of the PC AT do the same, so in a PC, a triple fault causes an immediate system reset.

Jazelle DBX is an extension that allows some ARM processors to execute Java bytecode in hardware as a third execution state alongside the existing ARM and Thumb modes. Jazelle functionality was specified in the ARMv5TEJ architecture and the first processor with Jazelle technology was the ARM926EJ-S. Jazelle is denoted by a "J" appended to the CPU name, except for post-v5 cores where it is required for architecture conformance.

PicoBlaze is the designation of a series of three free soft processor cores from Xilinx for use in their FPGA and CPLD products. They are based on an 8-bit RISC architecture and can reach speeds up to 100 MIPS on the Virtex 4 FPGA's family. The processors have an 8-bit address and data port for access to a wide range of peripherals. The license of the cores allows their free use, albeit only on Xilinx devices, and they come with development tools. Third-party tools are available from Mediatronix and others. Also PacoBlaze, a behavioral and device independent implementation of the cores exists and is released under the BSD License. The PauloBlaze is an open source VHDL implementation under the Apache License.

<span class="mw-page-title-main">Protection ring</span> Layer of protection in computer systems

In computer science, hierarchical protection domains, often called protection rings, are mechanisms to protect data and functionality from faults and malicious behavior.

The interrupt descriptor table (IDT) is a data structure used by the x86 architecture to implement an interrupt vector table. The IDT is used by the processor to determine the correct response to interrupts and exceptions.

The Global Descriptor Table (GDT) is a data structure used by Intel x86-family processors starting with the 80286 in order to define the characteristics of the various memory areas used during program execution, including the base address, the size, and access privileges like executability and writability. These memory areas are called segments in Intel terminology.

A call gate is a mechanism in Intel's x86 architecture for changing the privilege level of a process when it executes a predefined function call using a CALL FAR instruction.

<span class="mw-page-title-main">GEC 4000 series</span> Series of 16/32-bit minicomputers

The GEC 4000 was a series of 16/32-bit minicomputers produced by GEC Computers Ltd in the United Kingdom during the 1970s, 1980s and early 1990s.

<span class="mw-page-title-main">Unisys 2200 Series system architecture</span> Aspect of Unisys 2200 Series

The figure shows a high-level architecture of the OS 2200 system identifying major hardware and software components. The majority of the Unisys software is included in the subsystems and applications area of the model. For example, the database managers are subsystems and the compilers are applications.

Control-flow integrity (CFI) is a general term for computer security techniques that prevent a wide variety of malware attacks from redirecting the flow of execution of a program.

References

  1. Bovet, Daniel Pierre; Cesatí, Marco (2006). Understanding the Linux Kernel, Third Edition. O'Reilly Media. p. 104. ISBN   978-0-596-00565-8 . Retrieved 2009-11-23.
  2. Daniel P. Bovet; Marco Cesati (2006). Understanding the Linux Kernel. books.google.com. O'Reilly. p. 104. ISBN   9780596554910 . Retrieved 2014-02-25.
  3. "Intel 64 and IA-32 Architectures Software Developer's Manual Volume 3a" . Retrieved 21 May 2012.