Virtual thread

Last updated

In computer programming, a virtual thread is a thread that is managed by a runtime library or virtual machine (VM) and made to resemble "real" operating system thread to code executing on it, while requiring substantially fewer resources than the latter.

Contents

Virtual threads allows for tens of millions of preemptive tasks and events on a 2021 consumer-grade computer, [1] compared to low thousands of operating system threads. [2] Preemptive execution [3] is important to performance gains through parallelism and fast preemptive response times for tens of millions of events.

Earlier constructs that are not or not always preemptive, such as coroutines, green threads or the largely single-threaded Node.js, introduce delays in responding to asynchronous events such as every incoming request in a server application. [4]

Definition

Virtual threads were commercialized with Google's Chrome browser in 2008 [5] where virtual threads may hop physical threads. Virtual threads are truly virtual, created in user-space software.

Underlying reasons

Java servers have featured extensive and memory consuming software constructs allowing dozens of pooled operating system threads to preemptively execute thousands of requests per second without the use of virtual threads. Key to performance here is to reduce the initial latency in thread processing and minimize the time operating system threads are blocked. [7]

Virtual threads increase possible concurrency by many orders of magnitudes while the actual parallelism achieved is limited by available execution units and pipelining offered by present processors and processor cores. In 2021, a consumer grade computers typically offer a parallelism of tens of concurrent execution units. [8] For increased performance through parallelism, the language runtime need to use all present hardware, [9] not be single-threaded or feature global synchronization such as global interpreter lock.

The many magnitudes of increase in possible preemptive items offered by virtual threads is achieved by the language runtime managing resizable thread stacks. [10] Those stacks are smaller in size than those of operating system threads. The maximum number of threads possible without swapping is proportional to the amount of main memory. [11]

In order to support virtual threads efficiently, the language runtime has to be largely rewritten to prevent blocking calls from holding up an operating system thread assigned to execute a virtual thread [12] and to manage thread stacks. [13] An example of a retrofit of virtual threads is Java Loom. [14] An example of a new language designed for virtual threads is Go. [15]

Complexity

Because virtual threads offer parallelism, the programmer needs to be skilled in multi-threaded programming and synchronization.

Because a blocked virtual thread would block the OS thread it occupies at the moment, much effort must be taken in the runtime to handle blocking system calls. Typically, a thread from an pool of spare OS threads is used to execute the blocking call for the virtual thread so that the initially executing OS thread is not blocked.

Management of the virtual thread stack requires care in the linker and short predictions of additional stack space requirements.

Implementations

Google Chrome Browser

Virtual threads are used to serialize singleton input/output activities. When a virtual thread is executing, it can hop on different OS thread. The Chrome browser first appeared in 2008. Chrome's virtual threads are available to developers extending the browser.

Go

Go's goroutines became preemptive with Go 1.4 in 2014 and are a prominent application of virtual threads.

Java Project Loom

Project Loom: Virtual threads is a lightweight user-mode scheduled alternative to standard OS managed threads. Virtual threads are mapped to OS threads in a many-to-many relationship. Work on project loom by Oracle started in 2017. Loom has the goal of implementing virtual threads for performance, at the same time simplifying thread handling across OS threads, concurrent threads and virtual threads. As of 2022, Project Loom is available as early-access using JDK 19.

Other uses of the term

Intel [16] in 2007 referred to an Intel compiler specific optimization technique as virtual threads.

See also

Related Research Articles

<span class="mw-page-title-main">Garbage collection (computer science)</span> Form of automatic memory management

In computer science, garbage collection (GC) is a form of automatic memory management. The garbage collector attempts to reclaim memory that was allocated by the program, but is no longer referenced; such memory is called garbage. Garbage collection was invented by American computer scientist John McCarthy around 1959 to simplify manual memory management in Lisp.

<span class="mw-page-title-main">Computer multitasking</span> Concurrent execution of multiple processes

In computing, multitasking is the concurrent execution of multiple tasks over a certain period of time. New tasks can interrupt already started ones before they finish, instead of waiting for them to end. As a result, a computer executes segments of multiple tasks in an interleaved manner, while the tasks share common processing resources such as central processing units (CPUs) and main memory. Multitasking automatically interrupts the running program, saving its state and loading the saved state of another program and transferring control to it. This "context switch" may be initiated at fixed time intervals, or the running program may be coded to signal to the supervisory software when it can be interrupted.

<span class="mw-page-title-main">Java virtual machine</span> Virtual machine that runs Java programs

A Java virtual machine (JVM) is a virtual machine that enables a computer to run Java programs as well as programs written in other languages that are also compiled to Java bytecode. The JVM is detailed by a specification that formally describes what is required in a JVM implementation. Having a specification ensures interoperability of Java programs across different implementations so that program authors using the Java Development Kit (JDK) need not worry about idiosyncrasies of the underlying hardware platform.

<span class="mw-page-title-main">Process (computing)</span> Particular execution of a computer program

In computing, a process is the instance of a computer program that is being executed by one or many threads. There are many different process models, some of which are light weight, but almost all processes are rooted in an operating system (OS) process which comprises the program code, assigned system resources, physical and logical access permissions, and data structures to initiate, control and coordinate execution activity. Depending on the OS, a process may be made up of multiple threads of execution that execute instructions concurrently.

<span class="mw-page-title-main">Thread (computing)</span> Smallest sequence of programmed instructions that can be managed independently by a scheduler

In computer science, a thread of execution is the smallest sequence of programmed instructions that can be managed independently by a scheduler, which is typically a part of the operating system. In many cases, a thread is a component of a process.

<span class="mw-page-title-main">Parallel computing</span> Programming paradigm in which many processes are executed simultaneously

Parallel computing is a type of computation in which many calculations or processes are carried out simultaneously. Large problems can often be divided into smaller ones, which can then be solved at the same time. There are several different forms of parallel computing: bit-level, instruction-level, data, and task parallelism. Parallelism has long been employed in high-performance computing, but has gained broader interest due to the physical constraints preventing frequency scaling. As power consumption by computers has become a concern in recent years, parallel computing has become the dominant paradigm in computer architecture, mainly in the form of multi-core processors.

Coroutines are computer program components that allow execution to be suspended and resumed, generalizing subroutines for cooperative multitasking. Coroutines are well-suited for implementing familiar program components such as cooperative tasks, exceptions, event loops, iterators, infinite lists and pipes.

<span class="mw-page-title-main">OpenMP</span> Open standard for parallelizing

OpenMP is an application programming interface (API) that supports multi-platform shared-memory multiprocessing programming in C, C++, and Fortran, on many platforms, instruction-set architectures and operating systems, including Solaris, AIX, FreeBSD, HP-UX, Linux, macOS, and Windows. It consists of a set of compiler directives, library routines, and environment variables that influence run-time behavior.

<span class="mw-page-title-main">Task (computing)</span> Unit of execution or work in software

In computing, a task is a unit of execution or a unit of work. The term is ambiguous; precise alternative terms include process, light-weight process, thread, step, request, or query. In the adjacent diagram, there are queues of incoming work to do and outgoing completed work, and a thread pool of threads to perform this work. Either the work units themselves or the threads that perform the work can be referred to as "tasks", and these can be referred to respectively as requests/responses/threads, incoming tasks/completed tasks/threads, or requests/responses/tasks.

In computer programming, a runtime system or runtime environment is a sub-system that exists both in the computer where a program is created, as well as in the computers where the program is intended to be run. The name comes from the compile time and runtime division from compiled languages, which similarly distinguishes the computer processes involved in the creation of a program (compilation) and its execution in the target machine.

Concurrent computing is a form of computing in which several computations are executed concurrently—during overlapping time periods—instead of sequentially—with one completing before the next starts.

<span class="mw-page-title-main">Binary Modular Dataflow Machine</span>

Binary Modular Dataflow Machine (BMDFM) is a software package that enables running an application in parallel on shared memory symmetric multiprocessing (SMP) computers using the multiple processors to speed up the execution of single applications. BMDFM automatically identifies and exploits parallelism due to the static and mainly dynamic scheduling of the dataflow instruction sequences derived from the formerly sequential program.

In computer science, a fiber is a particularly lightweight thread of execution.

Cooperative multitasking, also known as non-preemptive multitasking, is a style of computer multitasking in which the operating system never initiates a context switch from a running process to another process. Instead, in order to run multiple applications concurrently, processes voluntarily yield control periodically or when idle or logically blocked. This type of multitasking is called cooperative because all programs must cooperate for the scheduling scheme to work.

In computer programming, a green thread is a thread that is scheduled by a runtime library or virtual machine (VM) instead of natively by the underlying operating system (OS). Green threads emulate multithreaded environments without relying on any native OS abilities, and they are managed in user space instead of kernel space, enabling them to work in environments that do not have native thread support.

In software development, the programming language Java was historically considered slower than the fastest third-generation typed languages such as C and C++. In contrast to those languages, Java compiles by default to a Java Virtual Machine (JVM) with operations distinct from those of the actual computer hardware. Early JVM implementations were interpreters; they simulated the virtual operations one-by-one rather than translating them into machine code for direct hardware execution.

Task parallelism is a form of parallelization of computer code across multiple processors in parallel computing environments. Task parallelism focuses on distributing tasks—concurrently performed by processes or threads—across different processors. In contrast to data parallelism which involves running the same task on different components of data, task parallelism is distinguished by running many different tasks at the same time on the same data. A common type of task parallelism is pipelining, which consists of moving a single set of data through a series of separate tasks where each task can execute independently of the others.

<span class="mw-page-title-main">Multithreading (computer architecture)</span> Ability of a CPU to provide multiple threads of execution concurrently

In computer architecture, multithreading is the ability of a central processing unit (CPU) to provide multiple threads of execution concurrently, supported by the operating system. This approach differs from multiprocessing. In a multithreaded application, the threads share the resources of a single or multiple cores, which include the computing units, the CPU caches, and the translation lookaside buffer (TLB).

<span class="mw-page-title-main">Parallel Extensions</span>

Parallel Extensions was the development name for a managed concurrency library developed by a collaboration between Microsoft Research and the CLR team at Microsoft. The library was released in version 4.0 of the .NET Framework. It is composed of two parts: Parallel LINQ (PLINQ) and Task Parallel Library (TPL). It also consists of a set of coordination data structures (CDS) – sets of data structures used to synchronize and co-ordinate the execution of concurrent tasks.

In parallel computing, work stealing is a scheduling strategy for multithreaded computer programs. It solves the problem of executing a dynamically multithreaded computation, one that can "spawn" new threads of execution, on a statically multithreaded computer, with a fixed number of processors. It does so efficiently in terms of execution time, memory usage, and inter-processor communication.

References

  1. Rudell, Harald (2022-03-19). "massivevirtualparallelism".
  2. baeldung (2022-01-02). "Maximum Number of Threads Per Process in Linux | Baeldung on Linux". www.baeldung.com. Retrieved 2022-03-30.
  3. "Go 1.14 Release Notes - The Go Programming Language". go.dev. Retrieved 2022-03-30.
  4. Node.js. "Don't Block the Event Loop (or the Worker Pool)". Node.js. Retrieved 2022-03-30.
  5. "Threading and Tasks in Chrome". chromium.googlesource.com. Retrieved 2022-04-05.
  6. Lu, Genchi (2021-07-22). "Java's Thread Model and Golang Goroutine". Medium. Retrieved 2022-04-05.
  7. "Principles to Handle Thousands of Connections in Java Using Netty - DZone Performance". dzone.com. Retrieved 2022-03-30.
  8. "MacBook Pro 14-inch and MacBook Pro 16-inch". Apple. Retrieved 2022-03-30.
  9. "Frequently Asked Questions (FAQ) - The Go Programming Language". go.dev. Retrieved 2022-03-30.
  10. "JEP draft: Virtual Threads (Preview)". openjdk.java.net. Retrieved 2022-03-30.
  11. Rudell, Harald (2022-03-22). "Maximum number of virtual threads in Go".
  12. Szczukocki, Denis (2020-03-18). "Difference Between Thread and Virtual Thread in Java | Baeldung". www.baeldung.com. Retrieved 2022-03-30.
  13. "Why you can have millions of Goroutines but only thousands of Java Threads". rcoh.me. 2018-04-12. Retrieved 2022-03-30.
  14. "Main - Main - OpenJDK Wiki". wiki.openjdk.java.net. Retrieved 2022-03-30.
  15. "The Go Programming Language". go.dev. 2022-03-22. Retrieved 2022-03-30.
  16. "Intel Technology Journal" (PDF).