Processor affinity

Last updated

Processor affinity, or CPU pinning or "cache affinity", enables the binding and unbinding of a process or a thread to a central processing unit (CPU) or a range of CPUs, so that the process or thread will execute only on the designated CPU or CPUs rather than any CPU. This can be viewed as a modification of the native central queue scheduling algorithm in a symmetric multiprocessing operating system. Each item in the queue has a tag indicating its kin processor. At the time of resource allocation, each task is allocated to its kin processor in preference to others.

Contents

Processor affinity takes advantage of the fact that remnants of a process that was run on a given processor may remain in that processor's state (for example, data in the cache memory) after another process was run on that processor. Scheduling a CPU-intensive process that has few interrupts to execute on the same processor may improve its performance by reducing degrading events such as cache misses, but may slow down ordinary programs because they would need to wait for that CPU to become available again. [1] A practical example of processor affinity is executing multiple instances of a non-threaded application, such as some graphics-rendering software. [ citation needed ]

Scheduling-algorithm implementations vary in adherence to processor affinity. Under certain circumstances, some implementations will allow a task to change to another processor if it results in higher efficiency. For example, when two processor-intensive tasks (A and B) have affinity to one processor while another processor remains unused, many schedulers will shift task B to the second processor in order to maximize processor use. Task B will then acquire affinity with the second processor, while task A will continue to have affinity with the original processor.[ citation needed ]

Usage

Processor affinity can effectively reduce cache problems, but it does not reduce the persistent load-balancing problem. [2] Also note that processor affinity becomes more complicated in systems with non-uniform architectures. For example, a system with two dual-core hyper-threaded CPUs presents a challenge to a scheduling algorithm.

There is complete affinity between two virtual CPUs implemented on the same core via hyper-threading, partial affinity between two cores on the same physical processor (as the cores share some, but not all, cache), and no affinity between separate physical processors. As other resources are also shared, processor affinity alone cannot be used as the basis for CPU dispatching. If a process has recently run on one virtual hyper-threaded CPU in a given core, and that virtual CPU is currently busy but its partner CPU is not, cache affinity would suggest that the process should be dispatched to the idle partner CPU. However, the two virtual CPUs compete for essentially all computing, cache, and memory resources. In this situation, it would typically be more efficient to dispatch the process to a different core or CPU, if one is available. This could incur a penalty when process repopulates the cache, but overall performance could be higher as the process would not have to compete for resources within the CPU.[ citation needed ]

Specific operating systems

On Linux, the CPU affinity of a process can be altered with the taskset(1) program [3] and the sched_setaffinity(2) system call. The affinity of a thread can be altered with one of the library functions: pthread_setaffinity_np(3) or pthread_attr_setaffinity_np(3).

On SGI systems, dplace binds a process to a set of CPUs. [4]

On DragonFly BSD 1.9 (2007) and later versions, usched_set system call can be used to control the affinity of a process. [5] [6] On NetBSD 5.0, FreeBSD 7.2, DragonFly BSD 4.7 and later versions can use pthread_setaffinity_np and pthread_getaffinity_np. [7] In NetBSD, the psrset utility [8] to set a thread's affinity to a certain CPU set. In FreeBSD, cpuset [9] utility is used to create CPU sets and to assign processes to these sets. In DragonFly BSD 3.1 (2012) and later, usched utility can be used for assigning processes to a certain CPU set. [10]

On Windows NT and its successors, thread and process CPU affinities can be set separately by using SetThreadAffinityMask [11] and SetProcessAffinityMask [12] API calls or via the Task Manager interface (for process affinity only).

macOS exposes an affinity API [13] that provides hints to the kernel how to schedule threads according to affinity sets.

On Solaris it is possible to control bindings of processes and LWPs to processor using the pbind(1) [14] program. To control the affinity programmatically processor_bind(2) [15] can be used. There are more generic interfaces available such as pset_bind(2) [16] or lgrp_affinity_get(3LGRP) [17] using processor set and locality groups concepts.

On AIX it is possible to control bindings of processes using the bindprocessor command [18] [19] and the bindprocessor API. [18] [20]

See also

Related Research Articles

<span class="mw-page-title-main">Thread (computing)</span> Smallest sequence of programmed instructions that can be managed independently by a scheduler

In computer science, a thread of execution is the smallest sequence of programmed instructions that can be managed independently by a scheduler, which is typically a part of the operating system. The implementation of threads and processes differs between operating systems, but in most cases a thread is a component of a process. The multiple threads of a given process may be executed concurrently, sharing resources such as memory, while different processes do not share these resources. In particular, the threads of a process share its executable code and the values of its dynamically allocated variables and non-thread-local global variables at any given time.

<span class="mw-page-title-main">System call</span> Way for programs to access kernel services

In computing, a system call is the programmatic way in which a computer program requests a service from the operating system on which it is executed. This may include hardware-related services, creation and execution of new processes, and communication with integral kernel services such as process scheduling. System calls provide an essential interface between a process and the operating system.

<span class="mw-page-title-main">Hyper-threading</span> Proprietary simultaneous multithreading implementation by Intel

Hyper-threading is Intel's proprietary simultaneous multithreading (SMT) implementation used to improve parallelization of computations performed on x86 microprocessors. It was introduced on Xeon server processors in February 2002 and on Pentium 4 desktop processors in November 2002. Since then, Intel has included this technology in Itanium, Atom, and Core 'i' Series CPUs, among others.

In computing, scheduling is the action of assigning resources to perform tasks. The resources may be processors, network links or expansion cards. The tasks may be threads, processes or data flows.

In software engineering, a spinlock is a lock that causes a thread trying to acquire it to simply wait in a loop ("spin") while repeatedly checking whether the lock is available. Since the thread remains active but is not performing a useful task, the use of such a lock is a kind of busy waiting. Once acquired, spinlocks will usually be held until they are explicitly released, although in some implementations they may be automatically released if the thread being waited on blocks or "goes to sleep".

<span class="mw-page-title-main">DragonFly BSD</span> BSD operating system

DragonFly BSD is a free and open-source Unix-like operating system forked from FreeBSD 4.8. Matthew Dillon, an Amiga developer in the late 1980s and early 1990s and FreeBSD developer between 1994 and 2003, began working on DragonFly BSD in June 2003 and announced it on the FreeBSD mailing lists on 16 July 2003.

RTLinux is a hard realtime real-time operating system (RTOS) microkernel that runs the entire Linux operating system as a fully preemptive process. The hard real-time property makes it possible to control robots, data acquisition systems, manufacturing plants, and other time-sensitive instruments and machines from RTLinux applications. The design was patented. Despite the similar name, it is not related to the Real-Time Linux project of the Linux Foundation.

POSIX Threads, commonly known as pthreads, is an execution model that exists independently from a language, as well as a parallel execution model. It allows a program to control multiple different flows of work that overlap in time. Each flow of work is referred to as a thread, and creation and control over these flows is achieved by making calls to the POSIX Threads API. POSIX Threads is an API defined by the standard POSIX.1c, Threads extensions .

A barrel processor is a CPU that switches between threads of execution on every cycle. This CPU design technique is also known as "interleaved" or "fine-grained" temporal multithreading. Unlike simultaneous multithreading in modern superscalar architectures, it generally does not allow execution of multiple instructions in one cycle.

Thread-local storage (TLS) is a computer programming method that uses static or global memory local to a thread.

The proc filesystem (procfs) is a special filesystem in Unix-like operating systems that presents information about processes and other system information in a hierarchical file-like structure, providing a more convenient and standardized method for dynamically accessing process data held in the kernel than traditional tracing methods or direct access to kernel memory. Typically, it is mapped to a mount point named /proc at boot time. The proc file system acts as an interface to internal data structures about running processes in the kernel. In Linux, it can also be used to obtain information about the kernel and to change certain kernel parameters at runtime (sysctl).

"Zero-copy" describes computer operations in which the CPU does not perform the task of copying data from one memory area to another or in which unnecessary data copies are avoided. This is frequently used to save CPU cycles and memory bandwidth in many time consuming tasks, such as when transmitting a file at high speed over a network, etc., thus improving performances of programs (processes) executed by a computer.

The jail mechanism is an implementation of FreeBSD's OS-level virtualisation that allows system administrators to partition a FreeBSD-derived computer system into several independent mini-systems called jails, all sharing the same kernel, with very little overhead. It is implemented through a system call, jail(2), as well as a userland utility, jail(8), plus, depending on the system, a number of other utilities. The functionality was committed into FreeBSD in 1999 by Poul-Henning Kamp after some period of production use by a hosting provider, and was first released with FreeBSD 4.0, thus being supported on a number of FreeBSD descendants, including DragonFly BSD, to this day.

<span class="mw-page-title-main">SPARCstation 10</span> Sun Microsystems workstation computer

The SPARCstation 10 is a workstation computer made by Sun Microsystems. Announced in May 1992, it was Sun's first desktop multiprocessor. It was later replaced with the SPARCstation 20.

<span class="mw-page-title-main">UltraSPARC T1</span> Microprocessor by Sun Microsystems

Sun Microsystems' UltraSPARC T1 microprocessor, known until its 14 November 2005 announcement by its development codename "Niagara", is a multithreading, multicore CPU. Designed to lower the energy consumption of server computers, the CPU typically uses 72 W of power at 1.4 GHz.

OS-level virtualization is an operating system (OS) paradigm in which the kernel allows the existence of multiple isolated user space instances, called containers, zones, virtual private servers (OpenVZ), partitions, virtual environments (VEs), virtual kernels, or jails. Such instances may look like real computers from the point of view of programs running in them. A computer program running on an ordinary operating system can see all resources of that computer. However, programs running inside of a container can only see the container's contents and devices assigned to the container.

An affinity mask is a bit mask indicating what processor(s) a thread or process should be run on by the scheduler of an operating system. Setting the affinity mask for certain processes running under Windows can be useful as there are several system processes that are restricted to the first CPU / Core. So, excluding the first CPU might lead to better application performance.

<span class="mw-page-title-main">Multithreading (computer architecture)</span> Ability of a CPU to provide multiple threads of execution concurrently

In computer architecture, multithreading is the ability of a central processing unit (CPU) to provide multiple threads of execution concurrently, supported by the operating system. This approach differs from multiprocessing. In a multithreaded application, the threads share the resources of a single or multiple cores, which include the computing units, the CPU caches, and the translation lookaside buffer (TLB).

Logical Domains is the server virtualization and partitioning technology for SPARC V9 processors. It was first released by Sun Microsystems in April 2007. After the Oracle acquisition of Sun in January 2010, the product has been re-branded as Oracle VM Server for SPARC from version 2.0 onwards.

Grand Central Dispatch, is a technology developed by Apple Inc. to optimize application support for systems with multi-core processors and other symmetric multiprocessing systems. It is an implementation of task parallelism based on the thread pool pattern. The fundamental idea is to move the management of the thread pool out of the hands of the developer, and closer to the operating system. The developer injects "work packages" into the pool oblivious of the pool's architecture. This model improves simplicity, portability and performance.

References

  1. "Processor affinity and binding" . Retrieved 2021-06-08.
  2. "White Paper - Processor Affinity" - From tmurgent.com. Accessed 2007-07-06.
  3. taskset(1)    Linux User Manual – User Commands
  4. dplace.1 Archived 2007-07-01 at the Wayback Machine - From sgi.com . Accessed 2007-07-06.
  5. "usched_set(2) — setting up a proc's usched". DragonFly System Calls Manual. DragonFly BSD . Retrieved 2019-07-28.
  6. "kern/kern_usched.c § sys_usched_set". BSD Cross Reference. DragonFly BSD . Retrieved 2019-07-28.
  7. pthread_setaffinity_np(3) NetBSD, FreeBSD and DragonFly BSD Library Functions Manual
  8. psrset(8)    NetBSD System Manager's Manual
  9. cpuset(1)    FreeBSD General Commands Manual
  10. "usched(8) — run a program with a specified userland scheduler and cpumask". DragonFly System Manager's Manual. DragonFly BSD . Retrieved 2019-07-28.
  11. SetThreadAffinityMask - MSDN Library
  12. SetProcessAffinityMask - MSDN Library
  13. "Thread Affinity API Release Notes". Developer.apple.com.
  14. pbind(1M) - Solaris man page
  15. processor_bind(2) - Solaris man page
  16. pset_bind(2) - Oracle Solaris 11.1 Information Library - man pages section 2
  17. lgrp_affinity_get(3LGRP) - Memory and Thread Placement Optimization Developer's Guide
  18. 1 2 Umesh Prabhakar Gaikwad; Kailas S. Zadbuke (November 16, 2006). "Processor affinity on AIX".
  19. "bindprocessor Command". IBM.
  20. "bindprocessor Subroutine". IBM.