Original author(s) | Alexei Starovoitov, Daniel Borkmann [1] [2] |
---|---|
Developer(s) | Open source community, Meta, Google, Isovalent, Microsoft, Netflix [1] |
Initial release | 2014[3] |
Repository | Linux: git Windows: github |
Written in | C |
Operating system | Linux, Windows [4] |
Type | Runtime system |
License | Linux: GPL Windows: MIT License |
Website | ebpf.io |
eBPF is a technology that can run programs in a privileged context such as the operating system kernel. [5] It is the successor to the Berkeley Packet Filter (BPF, with the "e" originally meaning "extended") filtering mechanism in Linux and is also used in other parts of the Linux kernel as well.
It is used to safely and efficiently extend the capabilities of the kernel at runtime without requiring changes to kernel source code or loading kernel modules. [6] Safety is provided through an in-kernel verifier which performs static code analysis and rejects programs which crash, hang or otherwise interfere with the kernel negatively. [7] [8]
This validation model differs from sandboxed environments, where the execution environment is restricted and the runtime has no insight about the program. [9] Examples of programs that are automatically rejected are programs without strong exit guarantees (i.e. for/while loops without exit conditions) and programs dereferencing pointers without safety checks. [10]
Loaded programs which passed the verifier are either interpreted or in-kernel just-in-time compiled (JIT compiled) for native execution performance. The execution model is event-driven and with few exceptions run-to-completion, [2] meaning, programs can be attached to various hook points in the operating system kernel and are run upon triggering of an event. eBPF use cases include (but are not limited to) networking such as XDP, tracing and security subsystems. [5] Given eBPF's efficiency and flexibility opened up new possibilities to solve production issues, Brendan Gregg famously dubbed eBPF "superpowers for Linux". [11] Linus Torvalds said, "BPF has actually been really useful, and the real power of it is how it allows people to do specialized code that isn't enabled until asked for". [12] Due to its success in Linux, the eBPF runtime has been ported to other operating systems such as Windows. [4]
eBPF was built on top of the Berkeley Packet Filter (cBPF). At the lowest level, it introduced the use of ten 64-bit registers (instead of two 32-bit long registers for cBPF), different jump semantics, a call instruction and corresponding register passing convention, new instructions, and a different encoding for these instructions. [13]
Date | Event |
---|---|
April 2011 | The first in-kernel Linux just-in-time compiler (JIT compiler) for the classic Berkeley Packet Filter got merged. [14] |
January 2012 | The first non-networking use case of the classic Berkeley Packet Filter, seccomp-bpf, [15] appeared; it allows filtering of system calls using a configurable policy implemented through BPF instructions. |
March 2014 | David S. Miller, primary maintainer of the Linux networking stack, accepted the rework of the old in-kernel BPF interpreter. It was replaced by an eBPF interpreter and the Linux kernel internally translates classic BPF (cBPF) into eBPF instructions. [16] |
March 2015 | The ability to attach eBPF to kprobes as first tracing use case was merged. [18] In the same month, initial infrastructure work got accepted to attach eBPF to the networking traffic control (tc) layer allowing to attach eBPF to the core ingress and later also egress paths of the network stack, later heavily used by projects such as Cilium. [19] [20] [21] |
August 2015 | The eBPF compiler backend got merged into LLVM 3.7.0 release. [22] |
September 2015 | Brendan Gregg announced a collection of new eBPF-based tracing tools as the bcc project, providing a front-end for eBPF to make it easier to write programs. [23] |
July 2016 | eBPF got the ability to be attached into network driver's core receive path. This layer is known today as eXpress DataPath (XDP) and was added as a response to DPDK to create a fast data path which works in combination with the Linux kernel rather than bypassing it. [24] [25] [26] |
August 2016 | Cilium was initially announced during LinuxCon as a project providing fast IPv6 container networking with eBPF and XDP. Today, Cilium has been adopted by major cloud provider's Kubernetes offerings and is one of the most widely used CNIs. [27] [21] [28] |
November 2016 | Netronome added offload of eBPF programs for XDP and tc BPF layer to their NIC. [29] |
May 2017 | Meta's layer 4 load-balancer, Katran, went live. Every packet towards facebook.com since then has been processed by eBPF & XDP. [30] |
November 2017 | eBPF becomes its own kernel subsystem to ease the continuously growing kernel patch management. The first pull request by eBPF maintainers was submitted. [31] |
September 2017 | Bpftool was added to the Linux kernel as a user space utility to introspect the eBPF subsystem. [32] |
January 2018 | A new socket family called AF_XDP was published, allowing for high performance packet processing with zero-copy semantics at the XDP layer. [33] Today, DPDK has an official AF_XDP poll-mode driver support. [34] |
February 2018 | The bpfilter prototype has been published, allowing translation of a subset of iptables rulesets into eBPF via a newly developed user mode driver. The work has caused controversies due to the ongoing nftables development effort and has not been merged into mainline. [35] [36] |
October 2018 | The new bpftrace tool has been announced by Brendan Gregg as DTrace 2.0 for Linux. [37] |
November 2018 | eBPF introspection has been added for kTLS in order to support the ability for in-kernel TLS policy enforcement. [38] |
November 2018 | BTF (BPF Type Format) has been added to the Linux kernel as an efficient meta data format which is approximately 100x smaller in size than DWARF. [39] |
December 2019 | The first 880-page long book on BPF, written by Brendan Gregg, was released. [40] |
March 2020 | Google upstreamed BPF LSM support into the Linux kernel, enabling programmable Linux Security Modules (LSMs) through eBPF. [41] |
September 2020 | The eBPF compiler backend for GNU Compiler Collection (GCC) was merged. [42] |
The alias eBPF is often interchangeably used with BPF, [2] [43] for example by the Linux kernel community. eBPF and BPF is referred to as a technology name like LLVM. [2] eBPF evolved from the Berkeley Packet Filter as an extended version, but its use case outgrew networking, and today eBPF as a pseudo-acronym is preferred. [2]
The bee is the official logo for eBPF. At the first eBPF Summit there was a vote taken and the bee mascot was named "eBee". [44] [45] The logo has originally been created by Vadim Shchekoldin. [45] Earlier unofficial eBPF mascots have existed in the past, [46] but haven't seen widespread adoption.
The eBPF Foundation was created in August 2021 with the goal to expand the contributions being made to extend the powerful capabilities of eBPF and grow beyond Linux. [1] Founding members include Meta, Google, Isovalent, Microsoft and Netflix. The purpose is to raise, budget and spend funds in support of various open source, open data and/or open standards projects relating to eBPF technologies [47] to further drive the growth and adoption of the eBPF ecosystem. Since inception, Red Hat, Huawei, Crowdstrike, Tigera, DaoCloud, Datoms, FutureWei also joined. [48]
eBPF has been adopted by a number of large-scale production users, for example:
Due to the ease of programmability, eBPF has been used as a tool for implementing microarchitectural timing side-channel attacks such as Spectre against vulnerable microprocessors. [93] While unprivileged eBPF implemented mitigations against transient execution attacks, [94] unprivileged use has ultimately been disabled by the kernel community by default to protect from use against future hardware vulnerabilities. [95]
Free and Open source Software Developers' European Meeting (FOSDEM) is a non-commercial, volunteer-organized European event centered on free and open-source software development. It is aimed at developers and anyone interested in the free and open-source software movement. It aims to enable developers to meet and to promote the awareness and use of free and open-source software.
Linux Virtual Server (LVS) is load balancing software for Linux kernel–based operating systems.
seccomp is a computer security facility in the Linux kernel. seccomp allows a process to make a one-way transition into a "secure" state where it cannot make any system calls except exit
, sigreturn
, read
and write
to already-open file descriptors. Should it attempt any other system calls, the kernel will either just log the event or terminate the process with SIGKILL or SIGSYS. In this sense, it does not virtualize the system's resources but isolates the process from them entirely.
OS-level virtualization is an operating system (OS) virtualization paradigm in which the kernel allows the existence of multiple isolated user space instances, called containers, zones, virtual private servers (OpenVZ), partitions, virtual environments (VEs), virtual kernels, or jails. Such instances may look like real computers from the point of view of programs running in them. A computer program running on an ordinary operating system can see all resources of that computer. However, programs running inside of a container can only see the container's contents and devices assigned to the container.
The following is a timeline of virtualization development. In computing, virtualization is the use of a computer to simulate another computer. Through virtualization, a host simulates a guest by exposing virtual hardware devices, which may be done through software or by allowing access to a physical device connected to the machine.
The Berkeley Packet Filter is a network tap and packet filter which permits computer network packets to be captured and filtered at the operating system level. It provides a raw interface to data link layers, permitting raw link-layer packets to be sent and received, and allows a userspace process to supply a filter program that specifies which packets it wants to receive. For example, a tcpdump process may want to receive only packets that initiate a TCP connection. BPF returns only packets that pass the filter that the process supplies. This avoids copying unwanted packets from the operating system kernel to the process, greatly improving performance. The filter program is in the form of instructions for a virtual machine, which are interpreted, or compiled into machine code by a just-in-time (JIT) mechanism and executed, in the kernel.
Microsoft Azure, often referred to as Azure, is a cloud computing platform developed by Microsoft. It offers access, management, and the development of applications and services through global data centers. It also provides a range of capabilities, including software as a service (SaaS), platform as a service (PaaS), and infrastructure as a service (IaaS). Microsoft Azure supports many programming languages, tools, and frameworks, including Microsoft-specific and third-party software and systems.
The Linux kernel is a free and open-source, monolithic, modular, multitasking, Unix-like operating system kernel. It was originally written in 1991 by Linus Torvalds for his i386-based PC, and it was soon adopted as the kernel for the GNU operating system, which was written to be a free (libre) replacement for Unix.
cgroups is a Linux kernel feature that limits, accounts for, and isolates the resource usage of a collection of processes.
netsniff-ng is a free Linux network analyzer and networking toolkit originally written by Daniel Borkmann. Its gain of performance is reached by zero-copy mechanisms for network packets, so that the Linux kernel does not need to copy packets from kernel space to user space via system calls such as recvmsg
. libpcap, starting with release 1.0.0, also supports the zero-copy mechanism on Linux for capturing (RX_RING), so programs using libpcap also use that mechanism on Linux.
A network scheduler, also called packet scheduler, queueing discipline (qdisc) or queueing algorithm, is an arbiter on a node in a packet switching communication network. It manages the sequence of network packets in the transmit and receive queues of the protocol stack and network interface controller. There are several network schedulers available for the different operating systems, that implement many of the existing network scheduling algorithms.
Google Cloud Platform (GCP), offered by Google, is a suite of cloud computing services that provides a series of modular cloud services including computing, data storage, data analytics, and machine learning, alongside a set of management tools. It runs on the same infrastructure that Google uses internally for its end-user products, such as Google Search, Gmail, and Google Docs, according to Verma, et.al. Registration requires a credit card or bank account details.
Kubernetes is an open-source container orchestration system for automating software deployment, scaling, and management. Originally designed by Google, the project is now maintained by a worldwide community of contributors, and the trademark is held by the Cloud Native Computing Foundation.
Container Linux is a discontinued open-source lightweight operating system based on the Linux kernel and designed for providing infrastructure for clustered deployments. One of its focuses was scalability. As an operating system, Container Linux provided only the minimal functionality required for deploying applications inside software containers, together with built-in mechanisms for service discovery and configuration sharing.
Microsoft, a technology company historically known for its opposition to the open source software paradigm, turned to embrace the approach in the 2010s. From the 1970s through 2000s under CEOs Bill Gates and Steve Ballmer, Microsoft viewed the community creation and sharing of communal code, later to be known as free and open source software, as a threat to its business, and both executives spoke negatively against it. In the 2010s, as the industry turned towards cloud, embedded, and mobile computing—technologies powered by open source advances—CEO Satya Nadella led Microsoft towards open source adoption although Microsoft's traditional Windows business continued to grow throughout this period generating revenues of 26.8 billion in the third quarter of 2018, while Microsoft's Azure cloud revenues nearly doubled.
XDP is an eBPF-based high-performance data path used to send and receive network packets at high rates by bypassing most of the operating system networking stack. It is merged in the Linux kernel since version 4.8. This implementation is licensed under GPL. Large technology firms including Amazon, Google and Intel support its development. Microsoft released their free and open source implementation XDP for Windows in May 2022. It is licensed under MIT License.
The Cloud Native Computing Foundation (CNCF) is a Linux Foundation project that was started in 2015 to help advance container technology and align the tech industry around its evolution.
io_uring is a Linux kernel system call interface for storage device asynchronous I/O operations addressing performance issues with similar interfaces provided by functions like read
/write
or aio_read
/aio_write
etc. for operations on data accessed by file descriptors.
Azure Linux, previously known as CBL-Mariner, is a free and open-source Linux distribution that Microsoft has developed. It is the base container OS for Microsoft Azure services and the graphical component of WSL 2.
Cilium is a cloud native technology for networking, observability, and security. It is based on the kernel technology eBPF, originally for better networking performance, and now leverages many additional features for different use cases. The core networking component has evolved from only providing a flat Layer 3 network for containers to including advanced networking features, like BGP and Service mesh, within a Kubernetes cluster, across multiple clusters, and connecting with the world outside Kubernetes. Hubble was created as the network observability component and Tetragon was later added for security observability and runtime enforcement. Cilium runs on Linux and is one of the first eBPF applications being ported to Microsoft Windows through the eBPF on Windows project.