This article needs to be updated.(May 2012) |
Original author(s) | Andrea Arcangeli |
---|---|
Initial release | March 8, 2005 |
Written in | C |
Operating system | Linux |
Type | Sandboxing |
License | GNU General Public License |
Website | code |
seccomp (short for secure computing [1] ) is a computer security facility in the Linux kernel. seccomp allows a process to make a one-way transition into a "secure" state where it cannot make any system calls except exit()
, sigreturn()
, read()
and write()
to already-open file descriptors. Should it attempt any other system calls, the kernel will either just log the event or terminate the process with SIGKILL or SIGSYS. [2] [3] In this sense, it does not virtualize the system's resources but isolates the process from them entirely.
seccomp mode is enabled via the PR_SET_SECCOMP
argument, or (since Linux kernel 3.17 [4] ) via the system call. [5] seccomp mode used to be enabled by writing to a file, /proc/self/seccomp
, but this method was removed in favor of prctl()
. [6] In some kernel versions, seccomp disables the RDTSC
x86 instruction, which returns the number of elapsed processor cycles since power-on, used for high-precision timing. [7]
seccomp-bpf is an extension to seccomp [8] that allows filtering of system calls using a configurable policy implemented using Berkeley Packet Filter rules. It is used by OpenSSH [9] and vsftpd as well as the Google Chrome/Chromium web browsers on ChromeOS and Linux. [10] (In this regard seccomp-bpf achieves similar functionality, but with more flexibility and higher performance, to the older systrace—which seems to be no longer supported for Linux.)
Some consider seccomp comparable to OpenBSD pledge(2) and FreeBSD capsicum(4)[ citation needed ].
seccomp was first devised by Andrea Arcangeli in January 2005 for use in public grid computing and was originally intended as a means of safely running untrusted compute-bound programs. It was merged into the Linux kernel mainline in kernel version 2.6.12, which was released on March 8, 2005. [11]
--sandbox
[14] --security-opt
parameter.In computing, the Executable and Linkable Format is a common standard file format for executable files, object code, shared libraries, and core dumps. First published in the specification for the application binary interface (ABI) of the Unix operating system version named System V Release 4 (SVR4), and later in the Tool Interface Standard, it was quickly accepted among different vendors of Unix systems. In 1999, it was chosen as the standard binary file format for Unix and Unix-like systems on x86 processors by the 86open project.
In computing, a system call is the programmatic way in which a computer program requests a service from the operating system on which it is executed. This may include hardware-related services, creation and execution of new processes, and communication with integral kernel services such as process scheduling. System calls provide an essential interface between a process and the operating system.
In computer security, a sandbox is a security mechanism for separating running programs, usually in an effort to mitigate system failures and/or software vulnerabilities from spreading. The sandbox metaphor derives from the concept of a child's sandbox—a play area where children can build, destroy, and experiment without causing any real-world damage. It is often used to kill untested or untrusted programs or code, possibly from unverified or untrusted third parties, suppliers, users or websites, without risking harm to the host machine or operating system. A sandbox typically provides a tightly controlled set of resources for guest programs to run in, such as storage and memory scratch space. Network access, the ability to inspect the host system, or read from input devices are usually disallowed or heavily restricted.
QEMU is a free and open-source emulator that uses dynamic binary translation to emulate the processor of a computer. It provides a variety of hardware and device models for the machine, enabling it to run different guest operating systems. QEMU can be used in conjunction with Kernel-based Virtual Machine (KVM) to execute virtual machines at near-native speeds. Additionally, QEMU supports the emulation of user-level processes, allowing applications compiled for one processor architecture to run on another.
OS-level virtualization is an operating system (OS) virtualization paradigm in which the kernel allows the existence of multiple isolated user space instances, including containers, zones, virtual private servers (OpenVZ), partitions, virtual environments (VEs), virtual kernels, and jails. Such instances may look like real computers from the point of view of programs running in them. A computer program running on an ordinary operating system can see all resources of that computer. Programs running inside a container can only see the container's contents and devices assigned to the container.
Kernel-based Virtual Machine (KVM) is a free and open-source virtualization module in the Linux kernel that allows the kernel to function as a hypervisor. It was merged into the mainline Linux kernel in version 2.6.20, which was released on February 5, 2007. KVM requires a processor with hardware virtualization extensions, such as Intel VT or AMD-V. KVM has also been ported to other operating systems such as FreeBSD and illumos in the form of loadable kernel modules.
The Berkeley Packet Filter is a network tap and packet filter which permits computer network packets to be captured and filtered at the operating system level. It provides a raw interface to data link layers, permitting raw link-layer packets to be sent and received, and allows a userspace process to supply a filter program that specifies which packets it wants to receive. For example, a tcpdump process may want to receive only packets that initiate a TCP connection. BPF returns only packets that pass the filter that the process supplies. This avoids copying unwanted packets from the operating system kernel to the process, greatly improving performance. The filter program is in the form of instructions for a virtual machine, which are interpreted, or compiled into machine code by a just-in-time (JIT) mechanism and executed, in the kernel.
Google Chrome is a web browser developed by Google. It was first released in 2008 for Microsoft Windows, built with free software components from Apple WebKit and Mozilla Firefox. Versions were later released for Linux, macOS, iOS, iPadOS, and also for Android, where it is the default browser. The browser is also the main component of ChromeOS, where it serves as the platform for web applications.
Google Native Client (NaCl) is a discontinued sandboxing technology for running either a subset of Intel x86, ARM, or MIPS native code, or a portable executable, in a sandbox. It allows safely running native code from a web browser, independent of the user operating system, allowing web apps to run at near-native speeds, which aligns with Google's plans for ChromeOS. It may also be used for securing browser plugins, and parts of other applications or full applications such as ZeroVM.
The Linux kernel is a free and open source, UNIX-like kernel that is used in many computer systems worldwide. The kernel was created by Linus Torvalds in 1991 and was soon adopted as the kernel for the GNU operating system (OS) which was created to be a free replacement for Unix. Since the late 1990s, it has been included in many operating system distributions, many of which are called Linux. One such Linux kernel operating system is Android which is used in many mobile and embedded devices.
Pwn2Own is a computer hacking contest held annually at the CanSecWest security conference. First held in April 2007 in Vancouver, the contest is now held twice a year, most recently in March 2024. Contestants are challenged to exploit widely used software and mobile devices with previously unknown vulnerabilities. Winners of the contest receive the device that they exploited and a cash prize. The Pwn2Own contest serves to demonstrate the vulnerability of devices and software in widespread use while also providing a checkpoint on the progress made in security since the previous year.
Linux Containers (LXC) is an operating system-level virtualization method for running multiple isolated Linux systems (containers) on a control host using a single Linux kernel.
Peppermint OS is a Linux distribution based on Debian and Devuan Stable, and formerly based on Ubuntu. It uses the Xfce desktop environment. It aims to provide a familiar environment for newcomers to Linux, which requires relatively low hardware resources to run.
Second Level Address Translation (SLAT), also known as nested paging, is a hardware-assisted virtualization technology which makes it possible to avoid the overhead associated with software-managed shadow page tables.
Namespaces are a feature of the Linux kernel that partition kernel resources such that one set of processes sees one set of resources, while another set of processes sees a different set of resources. The feature works by having the same namespace for a set of resources and processes, but those namespaces refer to distinct resources. Resources may exist in multiple namespaces. Examples of such resources are process IDs, host-names, user IDs, file names, some names associated with network access, and Inter-process communication.
Snap is a software packaging and deployment system developed by Canonical for operating systems that use the Linux kernel and the systemd init system. The packages, called snaps, and the tool for using them, snapd, work across a range of Linux distributions and allow upstream software developers to distribute their applications directly to users. Snaps are self-contained applications running in a sandbox with mediated access to the host system. Snap was originally released for cloud applications but was later ported to also work for Internet of Things devices and desktop applications.
Flatpak is a utility for software deployment and package management for Linux. It is advertised as offering a sandbox environment in which users can run application software in isolation from the rest of the system. Flatpak was known as xdg-app until 2016.
Kernel page-table isolation is a Linux kernel feature that mitigates the Meltdown security vulnerability and improves kernel hardening against attempts to bypass kernel address space layout randomization (KASLR). It works by better isolating user space and kernel space memory. KPTI was merged into Linux kernel version 4.15, and backported to Linux kernels 4.14.11, 4.9.75, and 4.4.110. Windows and macOS released similar updates. KPTI does not address the related Spectre vulnerability.
io_uring is a Linux kernel system call interface for storage device asynchronous I/O operations addressing performance issues with similar interfaces provided by functions like read
/write
or aio_read
/aio_write
etc. for operations on data accessed by file descriptors.
eBPF is a technology that can run programs in a privileged context such as the operating system kernel. It is the successor to the Berkeley Packet Filter filtering mechanism in Linux and is also used in non-networking parts of the Linux kernel as well.