Berkeley Packet Filter

Last updated
Berkeley Packet Filter
Developer(s) Steven McCanne,
Van Jacobson
Initial releaseDecember 19, 1992;30 years ago (1992-12-19)
Operating system Unix-like (FreeBSD, OpenBSD, NetBSD, DragonFly BSD, macOS, Oracle Solaris 11 and later, AIX, Tru64, Linux, Orbis), Windows

The Berkeley Packet Filter (BPF) is a technology used in certain computer operating systems for programs that need to, among other things, analyze network traffic. It provides a raw interface to data link layers, permitting raw link-layer packets to be sent and received. [1] In addition, if the driver for the network interface supports promiscuous mode, it allows the interface to be put into that mode so that all packets on the network can be received, even those destined to other hosts.

Contents

BPF supports filtering packets, allowing a userspace process to supply a filter program that specifies which packets it wants to receive. For example, a tcpdump process may want to receive only packets that initiate a TCP connection. BPF returns only packets that pass the filter that the process supplies. This avoids copying unwanted packets from the operating system kernel to the process, greatly improving performance. The filter program is in the form of instructions for a virtual machine, which are interpreted, or compiled into machine code by a just-in-time (JIT) mechanism and executed, in the kernel.

BPF is sometimes used to refer to just the filtering mechanism, rather than to the entire interface. Some systems, such as Linux and Tru64 UNIX, provide a raw interface to the data link layer other than the BPF raw interface but use the BPF filtering mechanisms for that raw interface. The BPF filtering mechanism is available on most Unix-like operating systems.

The Linux kernel provides an extended version of the BPF filtering mechanism, called eBPF, which uses a JIT mechanism, and which is used for packet filtering, as well as for other purposes in the kernel. eBPF is also available for Microsoft Windows. [2]

BPF provides pseudo-devices that can be bound to a network interface; reads from the device will read buffers full of packets received on the network interface, and writes to the device will inject packets on the network interface.

In 2007, Robert Watson and Christian Peron added zero-copy buffer extensions to the BPF implementation in the FreeBSD operating system, [3] allowing kernel packet capture in the device driver interrupt handler to write directly to user process memory in order to avoid the requirement for two copies for all packet data received via the BPF device. While one copy remains in the receipt path for user processes, this preserves the independence of different BPF device consumers, as well as allowing the packing of headers into the BPF buffer rather than copying complete packet data. [4]

Filtering

BPF's filtering capabilities are implemented as an interpreter for a machine language for the BPF virtual machine, a 32-bit machine with fixed-length instructions, one accumulator, and one index register. Programs in that language can fetch data from the packet, perform arithmetic operations on data from the packet, and compare the results against constants or against data in the packet or test bits in the results, accepting or rejecting the packet based on the results of those tests.

BPF is often extended by "overloading" the load (ld) and store (str) instructions.

Traditional Unix-like BPF implementations can be used in userspace, despite being written for kernel-space. This is accomplished using preprocessor conditions.

Extensions and optimizations

Some projects use BPF instruction sets or execution techniques different from the originals.

Some platforms, including FreeBSD, NetBSD, and WinPcap, use a just-in-time (JIT) compiler to convert BPF instructions into native code in order to improve performance. Linux includes a BPF JIT compiler which is disabled by default.

Kernel-mode interpreters for that same virtual machine language are used in raw data link layer mechanisms in other operating systems, such as Tru64 Unix, and for socket filters in the Linux kernel and in the WinPcap and Npcap packet capture mechanism.

Since version 3.18, the Linux kernel includes an extended BPF virtual machine with ten 64-bit registers, termed extended BPF (eBPF). It can be used for non-networking purposes, such as for attaching eBPF programs to various tracepoints. [5] [6] [7] Since kernel version 3.19, eBPF filters can be attached to sockets, [8] [9] and, since kernel version 4.1, to traffic control classifiers for the ingress and egress networking data path. [10] [11] The original and obsolete version has been retroactively renamed to classic BPF (cBPF). Nowadays, the Linux kernel runs eBPF only and loaded cBPF bytecode is transparently translated into an eBPF representation in the kernel before program execution. [12] All bytecode is verified before running to prevent denial-of-service attacks. Until Linux 5.3, the verifier prohibited the use of loops, to prevent potentially unbounded execution times; loops with bounded execution time are now permitted in more recent kernels. [13]

A user-mode interpreter for BPF is provided with the libpcap/WinPcap/Npcap implementation of the pcap API, so that, when capturing packets on systems without kernel-mode support for that filtering mechanism, packets can be filtered in user mode; code using the pcap API will work on both types of systems, although, on systems where the filtering is done in user mode, all packets, including those that will be filtered out, are copied from the kernel to user space. That interpreter can also be used when reading a file containing packets captured using pcap.

Another user-mode interpreter is uBPF, which supports JIT and eBPF (without cBPF). Its code has been reused to provide eBPF support in non-Linux systems. [14] Microsoft's eBPF on Windows builds on uBPF and the PREVAIL formal verifier. [15] rBPF, a Rust rewrite of uBPF, is used by the Solana blockchain platform as the execution engine. [16]

Programming

Classic BPF is generally emitted by a program from some very high-level textual rule describing the pattern to match. One such representation is found in libpcap. [17] Classic BPF and eBPF can also be written either directly as machine code, or using an assembly language for a textual representation. Notable assemblers include Linux kernel's bpf_asm tool (cBPF), bpfc (cBPF), and the ubpf assembler (eBPF). The bpftool command can also act as a disassembler for both flavors of BPF. The assembly languages are not necessarily compatible with each other.

eBPF bytecode has recently become a target of higher-level languages. LLVM added eBPF support in 2014, and GCC followed in 2019. Both toolkits allow compiling C and other supported languages to eBPF. A subset of P4 can also be compiled into eBPF using BCC, an LLVM-based compiler kit. [18]

History

The original paper was written by Steven McCanne and Van Jacobson in 1992 while at Lawrence Berkeley Laboratory. [1] [19]

In August 2003, SCO Group publicly claimed that the Linux kernel was infringing Unix code which they owned. [20] Programmers quickly discovered that one example they gave was the Berkeley Packet Filter, which in fact SCO never owned. [21] SCO has not explained or acknowledged the mistake but the ongoing legal action may eventually force an answer. [22] [ needs update? ]

Security concerns

The Spectre attack could leverage the Linux kernel's eBPF interpreter or JIT compiler to extract data from other kernel processes. [23] A JIT hardening feature in the kernel mitigates this vulnerability. [24]

Chinese computer security group Pangu Lab said the NSA used BPF to conceal network communications as part of a complex Linux backdoor. [25]

See also

Related Research Articles

tcpdump Data-network packet analyzer

tcpdump is a data-network packet analyzer computer program that runs under a command line interface. It allows the user to display TCP/IP and other packets being transmitted or received over a network to which the computer is attached. Distributed under the BSD license, tcpdump is free software.

Unix security refers to the means of securing a Unix or Unix-like operating system. A secure environment is achieved not only by the design concepts of these operating systems, but also through vigilant user and administrative practices.

PF is a BSD licensed stateful packet filter, a central piece of software for firewalling. It is comparable to netfilter (iptables), ipfw, and ipfilter.

CMUCL is a free Common Lisp implementation, originally developed at Carnegie Mellon University.

Netfilter is a framework provided by the Linux kernel that allows various networking-related operations to be implemented in the form of customized handlers. Netfilter offers various functions and operations for packet filtering, network address translation, and port translation, which provide the functionality required for directing packets through a network and prohibiting packets from reaching sensitive locations within a network.

rm (Unix) Unix command utility

rm is a basic command on Unix and Unix-like operating systems used to remove objects such as computer files, directories and symbolic links from file systems and also special files such as device nodes, pipes and sockets, similar to the del command in MS-DOS, OS/2, and Microsoft Windows. The command is also available in the EFI shell.

In the field of computer network administration, pcap is an application programming interface (API) for capturing network traffic. While the name is an abbreviation of packet capture, that is not the API's proper name. Unix-like systems implement pcap in the libpcap library; for Windows, there is a port of libpcap named WinPcap that is no longer supported or developed, and a port named Npcap for Windows 7 and later that is still supported.

seccomp is a computer security facility in the Linux kernel. seccomp allows a process to make a one-way transition into a "secure" state where it cannot make any system calls except exit , sigreturn , read and write to already-open file descriptors. Should it attempt any other system calls, the kernel will either just log the event or terminate the process with SIGKILL or SIGSYS. In this sense, it does not virtualize the system's resources but isolates the process from them entirely.

In computing, ioctl is a system call for device-specific input/output operations and other operations which cannot be expressed by regular system calls. It takes a parameter specifying a request code; the effect of a call depends completely on the request code. Request codes are often device-specific. For instance, a CD-ROM device driver which can instruct a physical device to eject a disc would provide an ioctl request code to do so. Device-independent request codes are sometimes used to give userspace access to kernel functions which are only used by core system software or still under development.

Netlink is a socket family used for inter-process communication (IPC) between both the kernel and userspace processes, and between different userspace processes, in a way similar to the Unix domain sockets available on certain Unix-like operating systems, including its original incarnation as a Linux kernel interface, as well as in the form of a later implementation on FreeBSD. Similarly to the Unix domain sockets, and unlike INET sockets, Netlink communication cannot traverse host boundaries. However, while the Unix domain sockets use the file system namespace, Netlink sockets are usually addressed by process identifiers (PIDs).

<span class="mw-page-title-main">Wireshark</span> Network traffic analyzer

Wireshark is a free and open-source packet analyzer. It is used for network troubleshooting, analysis, software and communications protocol development, and education. Originally named Ethereal, the project was renamed Wireshark in May 2006 due to trademark issues.

In operating systems, a giant lock, also known as a big-lock or kernel-lock, is a lock that may be used in the kernel to provide concurrency control required by symmetric multiprocessing (SMP) systems.

<span class="mw-page-title-main">Kernel (operating system)</span> Core of a computer operating system

The kernel is a computer program at the core of a computer's operating system and generally has complete control over everything in the system. It is the portion of the operating system code that is always resident in memory and facilitates interactions between hardware and software components. A full kernel controls all hardware resources via device drivers, arbitrates conflicts between processes concerning such resources, and optimizes the utilization of common resources e.g. CPU & cache usage, file systems, and network sockets. On most systems, the kernel is one of the first programs loaded on startup. It handles the rest of startup as well as memory, peripherals, and input/output (I/O) requests from software, translating them into data-processing instructions for the central processing unit.

<span class="mw-page-title-main">Unix</span> Family of computer operating systems

Unix is a family of multitasking, multiuser computer operating systems that derive from the original AT&T Unix, whose development started in 1969 at the Bell Labs research center by Ken Thompson, Dennis Ritchie, and others.

nftables is a subsystem of the Linux kernel providing filtering and classification of network packets/datagrams/frames. It has been available since Linux kernel 3.13 released on 19 January 2014.

ngrep Packet analyser

ngrep is a network packet analyzer written by Jordan Ritter. It has a command-line interface, and relies upon the pcap library and the GNU regex library.

netsniff-ng Linux networking toolkit

netsniff-ng is a free Linux network analyzer and networking toolkit originally written by Daniel Borkmann. Its gain of performance is reached by zero-copy mechanisms for network packets, so that the Linux kernel does not need to copy packets from kernel space to user space via system calls such as recvmsg . libpcap, starting with release 1.0.0, also supports the zero-copy mechanism on Linux for capturing (RX_RING), so programs using libpcap also use that mechanism on Linux.

<span class="mw-page-title-main">Network scheduler</span> Arbiter on a node in packet switching communication network

A network scheduler, also called packet scheduler, queueing discipline (qdisc) or queueing algorithm, is an arbiter on a node in a packet switching communication network. It manages the sequence of network packets in the transmit and receive queues of the protocol stack and network interface controller. There are several network schedulers available for the different operating systems, that implement many of the existing network scheduling algorithms.

XDP is an eBPF-based high-performance data path used to send and receive network packets at high rates by bypassing most of the operating system networking stack. It is merged in the Linux kernel since version 4.8. This implementation is licensed under GPL. Large technology firms including Amazon, Google and Intel support its development. Microsoft released their free and open source implementation XDP for Windows in May 2022. It is licensed under MIT License.

References

  1. 1 2 McCanne, Steven; Jacobson, Van (1992-12-19). "The BSD Packet Filter: A New Architecture for User-level Packet Capture" (PDF).
  2. "Microsoft embraces Linux kernel's eBPF super-tool, extends it for Windows". The Register. 2021-05-11. Archived from the original on 2021-05-11.
  3. "bpf(4) Berkeley Packet Filter". FreeBSD. 2010-06-15.
  4. Watson, Robert N. M.; Peron, Christian S. J. (2007-03-09). "Zero-Copy BPF" (PDF).
  5. "Linux kernel 3.18, Section 1.3. bpf() syscall for eBFP virtual machine programs". kernelnewbies.org. December 7, 2014. Retrieved September 6, 2019.
  6. Jonathan Corbet (September 24, 2014). "The BPF system call API, version 14". LWN.net . Retrieved January 19, 2015.
  7. Jonathan Corbet (July 2, 2014). "Extending extended BPF". LWN.net . Retrieved January 19, 2015.
  8. "Linux kernel 3.19, Section 11. Networking". kernelnewbies.org. February 8, 2015. Retrieved February 13, 2015.
  9. Jonathan Corbet (December 10, 2014). "Attaching eBPF programs to sockets". LWN.net . Retrieved February 13, 2015.
  10. "Linux kernel 4.1, Section 11. Networking". kernelnewbies.org. June 21, 2015. Retrieved October 17, 2015.
  11. "BPF and XDP Reference Guide". cilium.readthedocs.io. April 24, 2017. Retrieved April 23, 2018.
  12. "BPF and XDP Reference Guide — Cilium 1.6.5 documentation". docs.cilium.io. Retrieved 2019-12-18.
  13. "bpf: introduce bounded loops". git.kernel.org. June 19, 2019. Retrieved August 19, 2022.
  14. "generic-ebpf/generic-ebpf". GitHub. 28 April 2022.
  15. "microsoft/ebpf-for-windows: eBPF implementation that runs on top of Windows". GitHub. Microsoft. 11 May 2021.
  16. "Overview | Solana Docs".
  17. "BPF syntax". biot.com.
  18. "Dive into BPF: a list of reading material". qmonnet.github.io.
  19. McCanne, Steven; Jacobson, Van (January 1993). "The BSD Packet Filter: A New Architecture for User-level Packet Capture". USENIX.
  20. "SCOsource update". 15 Obfuscated Copying. Archived from the original on August 25, 2003. Retrieved September 5, 2019.
  21. Bruce Perens. "Analysis of SCO's Las Vegas Slide Show". Archived from the original on February 17, 2009.
  22. Moglen, Eben (November 24, 2003). "SCO: Without Fear and Without Research". GNU Operating System. The Free Software Foundation. Retrieved September 5, 2019.
  23. "Reading privileged memory with a side-channel". Project Zero team at Google. January 3, 2018. Retrieved January 20, 2018.
  24. "bpf: introduce BPF_JIT_ALWAYS_ON config". git.kernel.org. Archived from the original on 2020-10-19. Retrieved 2021-09-20.
  25. "Anatomy of suspected top-tier decade-hidden NSA backdoor". The Register. February 23, 2022. Retrieved February 24, 2022.

Further reading