Virtual thread

Last updated January 27, 2025

In computer programming, a virtual thread is a thread that is managed by a runtime library or virtual machine (VM) and made to resemble "real" operating system thread to code executing on it, while requiring substantially fewer resources than the latter.

Virtual threads allows for tens of millions of preemptive tasks and events on a 2021 consumer-grade computer,^[1] compared to low thousands of operating system threads.^[2] Preemptive execution^[3] is important to performance gains through parallelism and fast preemptive response times for tens of millions of events.

Earlier constructs that are not or not always preemptive, such as coroutines, green threads or the largely single-threaded Node.js, introduce delays in responding to asynchronous events such as every incoming request in a server application.^[4]

Definition

Virtual threads were commercialized with Google's Chrome browser in 2008^[5] where virtual threads may hop physical threads. Virtual threads are truly virtual, created in user-space software.

Virtual threads are preemptive
- This is important for response performance, that the virtual thread can react to events without programmer intervention or before concluding a current task
- Preemption requires knowledge of multi-threaded programming to avoid torn writes, data races, and invisible writes by other threads
Virtual threads can hop over the execution units of all processors and cores
- This allows utilisation of all available hardware, a 10x increase on today's computers
- In the go1.18 implementation, there are virtual thread queues per execution unit. There are additional virtual threads not allocated to an execution unit and an execution unit can steal virtual threads from another execution unit^[6]
Virtual threads require no yield or similar interventions by the programmer
- Virtual threads appear to execute continuously until they return or stop at a synchronization lock
- Unlike coroutines, if a virtual thread is in an infinite loop, it does not block the program. Execution continues at a higher cpu load, even if there are more looping threads than available execution units
Virtual threads can number in the tens of millions by featuring small often managed stacks
- This allows for several magnitudes more threads than from using OS threads
- Go 1.18 can launch 15 million virtual threads on a 2021 consumer-grade computer, i.e. about 350,000 per gigabyte of main memory. This is enabled by goroutines having a resizable, less than 3 KiB stack
- A consumer grade computer typically supports 3,000 OS threads and through system configuration can offer maybe 15,000
Virtual threads can be allocated quickly, similar to the rate of memory allocations
- Because allocation of a virtual thread is akin to allocating memory structures, they can be allocated very quickly, perhaps 600,000 per second. This is not possible for OS threads that would crash the host far below this rate
- The quicker ramp-up lessens the need for thread-pools of pre-launched threads to cater for sudden increases in traffic
- In Go, a virtual thread is allocated using a function call preceded by the keyword "go". The function call provides a closure of variable values guaranteed to be visible to the new goroutine. goroutines have no return value, so a goroutine that returns just disappears
Virtual threads share memory map like OS threads
- Like OS threads, virtual threads share the memory across the process and can therefore freely share and access memory objects subject to synchronization
- Some single-threaded architectures, such as the V8 ECMAScript engine used by Node.js, do not readily accept data that the particular thread did not allocate, requiring special zero-copy data types to be used when sharing data between threads
Virtual threads offer parallelism like OS threads
- Parallelism means that multiple instructions are executed truly at the same time which typically leads to a magnitude of faster performance
- This is different from the simpler concurrency, in which a single execution unit executes multiple threads shared in small time increments. The time-slicing makes each thread appear to be continuously executing. While concurrency is easier to implement and program, it does not offer any gains in performance

Underlying reasons

Java servers have featured extensive and memory consuming software constructs allowing dozens of pooled operating system threads to preemptively execute thousands of requests per second without the use of virtual threads. Key to performance here is to reduce the initial latency in thread processing and minimize the time operating system threads are blocked.^[7]

Virtual threads increase possible concurrency by many orders of magnitudes while the actual parallelism achieved is limited by available execution units and pipelining offered by present processors and processor cores. In 2021, a consumer grade computers typically offer a parallelism of tens of concurrent execution units.^[8] For increased performance through parallelism, the language runtime need to use all present hardware,^[9] not be single-threaded or feature global synchronization such as global interpreter lock.

The many magnitudes of increase in possible preemptive items offered by virtual threads is achieved by the language runtime managing resizable thread stacks.^[10] Those stacks are smaller in size than those of operating system threads. The maximum number of threads possible without swapping is proportional to the amount of main memory.^[11]

In order to support virtual threads efficiently, the language runtime has to be largely rewritten to prevent blocking calls from holding up an operating system thread assigned to execute a virtual thread^[12] and to manage thread stacks.^[13] An example of a retrofit of virtual threads is Java Loom.^[14] An example of a new language designed for virtual threads is Go.^[15]

Complexity

Because virtual threads offer parallelism, the programmer needs to be skilled in multi-threaded programming and synchronization.

Because a blocked virtual thread would block the OS thread it occupies at the moment, much effort must be taken in the runtime to handle blocking system calls. Typically, a thread from a pool of spare OS threads is used to execute the blocking call for the virtual thread so that the initially executing OS thread is not blocked.

Management of the virtual thread stack requires care in the linker and short predictions of additional stack space requirements.

Implementations

Google Chrome Browser

Virtual threads are used to serialize singleton input/output activities. When a virtual thread is executing, it can hop on different OS thread. The Chrome browser first appeared in 2008. Chrome's virtual threads are available to developers extending the browser.

Go

Go's goroutines became preemptive with Go 1.4 in 2014 and are a prominent application of virtual threads.

Java

Java introduced virtual threads in 2023 with JDK 21, with the limitation that any code running on a virtual thread which uses synchronised blocks or native calls will become pinned to its carrier OS thread.^[16]

Other uses of the term

Intel^[17] in 2007 referred to an Intel compiler specific optimization technique as virtual threads.

Related Research Articles

In computer science, garbage collection (GC) is a form of automatic memory management. The garbage collector attempts to reclaim memory that was allocated by the program, but is no longer referenced; such memory is called garbage. Garbage collection was invented by American computer scientist John McCarthy around 1959 to simplify manual memory management in Lisp.

<span class="mw-page-title-main">Computer multitasking</span> Concurrent execution of multiple processes

In computing, multitasking is the concurrent execution of multiple tasks over a certain period of time. New tasks can interrupt already started ones before they finish, instead of waiting for them to end. As a result, a computer executes segments of multiple tasks in an interleaved manner, while the tasks share common processing resources such as central processing units (CPUs) and main memory. Multitasking automatically interrupts the running program, saving its state and loading the saved state of another program and transferring control to it. This "context switch" may be initiated at fixed time intervals, or the running program may be coded to signal to the supervisory software when it can be interrupted.

<span class="mw-page-title-main">Java virtual machine</span> Virtual machine that runs Java programs

A Java virtual machine (JVM) is a virtual machine that enables a computer to run Java programs as well as programs written in other languages that are also compiled to Java bytecode. The JVM is detailed by a specification that formally describes what is required in a JVM implementation. Having a specification ensures interoperability of Java programs across different implementations so that program authors using the Java Development Kit (JDK) need not worry about idiosyncrasies of the underlying hardware platform.

In computing, a process is the instance of a computer program that is being executed by one or many threads. There are many different process models, some of which are light weight, but almost all processes are rooted in an operating system (OS) process which comprises the program code, assigned system resources, physical and logical access permissions, and data structures to initiate, control and coordinate execution activity. Depending on the OS, a process may be made up of multiple threads of execution that execute instructions concurrently.

In computer science, a thread of execution is the smallest sequence of programmed instructions that can be managed independently by a scheduler, which is typically a part of the operating system. In many cases, a thread is a component of a process.

A computing platform, digital platform, or software platform is the infrastructure on which software is executed. While the individual components of a computing platform may be obfuscated under layers of abstraction, the summation of the required components comprise the computing platform.

Coroutines are computer program components that allow execution to be suspended and resumed, generalizing subroutines for cooperative multitasking. Coroutines are well-suited for implementing familiar program components such as cooperative tasks, exceptions, event loops, iterators, infinite lists and pipes.

OpenMP is an application programming interface (API) that supports multi-platform shared-memory multiprocessing programming in C, C++, and Fortran, on many platforms, instruction-set architectures and operating systems, including Solaris, AIX, FreeBSD, HP-UX, Linux, macOS, and Windows. It consists of a set of compiler directives, library routines, and environment variables that influence run-time behavior.

In computing, single program, multiple data (SPMD) is a term that has been used to refer to computational models for exploiting parallelism whereby multiple processors cooperate in the execution of a program in order to obtain results faster.

In computer science, a finalizer or finalize method is a special method that performs finalization, generally some form of cleanup. A finalizer is executed during object destruction, prior to the object being deallocated, and is complementary to an initializer, which is executed during object creation, following allocation. Finalizers are strongly discouraged by some, due to difficulty in proper use and the complexity they add, and alternatives are suggested instead, mainly the dispose pattern.

In computer programming, a runtime system or runtime environment is a sub-system that exists in the computer where a program is created, as well as in the computers where the program is intended to be run. The name comes from the compile time and runtime division from compiled languages, which similarly distinguishes the computer processes involved in the creation of a program (compilation) and its execution in the target machine.

Concurrent computing is a form of computing in which several computations are executed concurrently—during overlapping time periods—instead of sequentially—with one completing before the next starts.

<span class="mw-page-title-main">Binary Modular Dataflow Machine</span>

Binary Modular Dataflow Machine (BMDFM) is a software package that enables running an application in parallel on shared memory symmetric multiprocessing (SMP) computers using the multiple processors to speed up the execution of single applications. BMDFM automatically identifies and exploits parallelism due to the static and mainly dynamic scheduling of the dataflow instruction sequences derived from the formerly sequential program.

In computer science, a fiber is a particularly lightweight thread of execution.

Cooperative multitasking, also known as non-preemptive multitasking, is a style of computer multitasking in which the operating system never initiates a context switch from a running process to another process. Instead, in order to run multiple applications concurrently, processes voluntarily yield control periodically or when idle or logically blocked. This type of multitasking is called cooperative because all programs must cooperate for the scheduling scheme to work.

In computer programming, a green thread is a thread that is scheduled by a runtime library or virtual machine (VM) instead of natively by the underlying operating system (OS). Green threads emulate multithreaded environments without relying on any native OS abilities, and they are managed in user space instead of kernel space, enabling them to work in environments that do not have native thread support.

Thread Level Speculation (TLS), also known as Speculative Multi-threading, or Speculative Parallelization, is a technique to speculatively execute a section of computer code that is anticipated to be executed later in parallel with the normal execution on a separate independent thread. Such a speculative thread may need to make assumptions about the values of input variables. If these prove to be invalid, then the portions of the speculative thread that rely on these input variables will need to be discarded and squashed. If the assumptions are correct the program can complete in a shorter time provided the thread was able to be scheduled efficiently.

Task parallelism is a form of parallelization of computer code across multiple processors in parallel computing environments. Task parallelism focuses on distributing tasks—concurrently performed by processes or threads—across different processors. In contrast to data parallelism which involves running the same task on different components of data, task parallelism is distinguished by running many different tasks at the same time on the same data. A common type of task parallelism is pipelining, which consists of moving a single set of data through a series of separate tasks where each task can execute independently of the others.

Parallel Extensions was the development name for a managed concurrency library developed by a collaboration between Microsoft Research and the CLR team at Microsoft. The library was released in version 4.0 of the .NET Framework. It is composed of two parts: Parallel LINQ (PLINQ) and Task Parallel Library (TPL). It also consists of a set of coordination data structures (CDS) – sets of data structures used to synchronize and co-ordinate the execution of concurrent tasks.

In parallel computing, work stealing is a scheduling strategy for multithreaded computer programs. It solves the problem of executing a dynamically multithreaded computation, one that can "spawn" new threads of execution, on a statically multithreaded computer, with a fixed number of processors. It does so efficiently in terms of execution time, memory usage, and inter-processor communication.

References

↑ Rudell, Harald (2022-03-19). "massivevirtualparallelism".
↑ baeldung (2022-01-02). "Maximum Number of Threads Per Process in Linux | Baeldung on Linux". www.baeldung.com. Retrieved 2022-03-30.
↑ "Go 1.14 Release Notes - The Go Programming Language". go.dev. Retrieved 2022-03-30.
↑ Node.js. "Don't Block the Event Loop (or the Worker Pool)". Node.js. Retrieved 2022-03-30.
↑ "Threading and Tasks in Chrome". chromium.googlesource.com. Retrieved 2022-04-05.
↑ Lu, Genchi (2021-07-22). "Java's Thread Model and Golang Goroutine". Medium. Retrieved 2022-04-05.
↑ "Principles to Handle Thousands of Connections in Java Using Netty - DZone Performance". dzone.com. Retrieved 2022-03-30.
↑ "MacBook Pro 14-inch and MacBook Pro 16-inch". Apple. Retrieved 2022-03-30.
↑ "Frequently Asked Questions (FAQ) - The Go Programming Language". go.dev. Retrieved 2022-03-30.
↑ "JEP draft: Virtual Threads (Preview)". openjdk.java.net. Retrieved 2022-03-30.
↑ Rudell, Harald (2022-03-22). "Maximum number of virtual threads in Go".
↑ Szczukocki, Denis (2020-03-18). "Difference Between Thread and Virtual Thread in Java | Baeldung". www.baeldung.com. Retrieved 2022-03-30.
↑ "Why you can have millions of Goroutines but only thousands of Java Threads". rcoh.me. 2018-04-12. Retrieved 2022-03-30.
↑ "Main - Main - OpenJDK Wiki". wiki.openjdk.java.net. Retrieved 2022-03-30.
↑ "The Go Programming Language". go.dev. 2022-03-22. Retrieved 2022-03-30.
↑ "Virtual Threads". Oracle Help Center. Retrieved 2024-09-10.
↑ "Intel Technology Journal" (PDF).

External links

massivevirtualparallelism Go program testing limits on virtual threads

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[javaworld2-1] Rudell, Harald (2022-03-19). "massivevirtualparallelism".

[2] ung (2022-01-02). "Maximum Number of Threads Per Process in Linux | Baeldung on Linux". www.baeldung.com. Retrieved 2022-03-30.

[3] "Go 1.14 Release Notes - The Go Programming Language". go.dev. Retrieved 2022-03-30.

[4] Node.js. "Don't Block the Event Loop (or the Worker Pool)". Node.js. Retrieved 2022-03-30.

[5] "Threading and Tasks in Chrome". chromium.googlesource.com. Retrieved 2022-04-05.

[6] Lu, Genchi (2021-07-22). "Java's Thread Model and Golang Goroutine". Medium. Retrieved 2022-04-05.

[7] "Principles to Handle Thousands of Connections in Java Using Netty - DZone Performance". dzone.com. Retrieved 2022-03-30.

[8] "MacBook Pro 14-inch and MacBook Pro 16-inch". Apple. Retrieved 2022-03-30.

[9] "Frequently Asked Questions (FAQ) - The Go Programming Language". go.dev. Retrieved 2022-03-30.

[10] "JEP draft: Virtual Threads (Preview)". openjdk.java.net. Retrieved 2022-03-30.

[11] Rudell, Harald (2022-03-22). "Maximum number of virtual threads in Go".

[12] Szczukocki, Denis (2020-03-18). "Difference Between Thread and Virtual Thread in Java | Baeldung". www.baeldung.com. Retrieved 2022-03-30.

[13] "Why you can have millions of Goroutines but only thousands of Java Threads". rcoh.me. 2018-04-12. Retrieved 2022-03-30.

[14] "Main - Main - OpenJDK Wiki". wiki.openjdk.java.net. Retrieved 2022-03-30.

[15] "The Go Programming Language". go.dev. 2022-03-22. Retrieved 2022-03-30.

[16] "Virtual Threads". Oracle Help Center. Retrieved 2024-09-10.

[17] "Intel Technology Journal" (PDF).

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]