This article may be too technical for most readers to understand.(June 2016) |
"Zero-copy" describes computer operations in which the CPU does not perform the task of copying data from one memory area to another or in which unnecessary data copies are avoided. This is frequently used to save CPU cycles and memory bandwidth in many time consuming tasks, such as when transmitting a file at high speed over a network, etc., thus improving the performance of programs (processes) executed by a computer. [1] [2] [3] [4]
Zero-copy programming techniques can be used when exchanging data within a user space process (i.e. between two or more threads, etc.) and/or between two or more processes (see also producer–consumer problem) and/or when data has to be accessed / copied / moved inside kernel space or between a user space process and kernel space portions of operating systems (OS).
Usually when a user space process has to execute system operations like reading or writing data from/to a device (i.e. a disk, a NIC, etc.) through their high level software interfaces or like moving data from one device to another, etc., it has to perform one or more system calls that are then executed in kernel space by the operating system.
If data has to be copied or moved from source to destination and both are located inside kernel space (i.e. two files, a file and a network card, etc.) then unnecessary data copies, from kernel space to user space and from user space to kernel space, can be avoided by using special (zero-copy) system calls, usually available in most recent versions of popular operating systems.
Zero-copy versions of operating system elements, such as device drivers, file systems, network protocol stacks, etc., greatly increase the performance of certain application programs (that become processes when executed) and more efficiently utilize system resources. Performance is enhanced by allowing the CPU to move on to other tasks while data copies / processing proceed in parallel in another part of the machine. Also, zero-copy operations reduce the number of time-consuming context switches between user space and kernel space. System resources are utilized more efficiently since using a sophisticated CPU to perform extensive data copy operations, which is a relatively simple task, is wasteful if other simpler system components can do the copying.
As an example, reading a file and then sending it over a network the traditional way requires 2 extra data copies (1 to read from kernel to user space + 1 to write from user to kernel space) and 4 context switches per read/write cycle. Those extra data copies use the CPU. Sending that file by using mmap of file data and a cycle of write calls, reduces the context switches to 2 per write call and avoids those previous 2 extra user data copies. Sending the same file via zero copy reduces the context switches to 2 per sendfile call and eliminates all CPU extra data copies (both in user and in kernel space). [1] [2] [3] [4]
Zero-copy protocols are especially important for very high-speed networks in which the capacity of a network link approaches or exceeds the CPU's processing capacity. In such a case the CPU may spend nearly all of its time copying transferred data, and thus becomes a bottleneck which limits the communication rate to below the link's capacity. A rule of thumb used in the industry is that roughly one CPU clock cycle is needed to process one bit of incoming data.
An early implementation was IBM OS/360 where a program can instruct the channel subsystem to read blocks of data from one file or device into a buffer and write to another from the same buffer without moving the data.
Techniques for creating zero-copy software include the use of direct memory access (DMA)-based copying and memory-mapping through a memory management unit (MMU). These features require specific hardware support and usually involve particular memory alignment requirements.
A newer approach used by the Heterogeneous System Architecture (HSA) facilitates the passing of pointers between the CPU and the GPU and also other processors. This requires a unified address space for the CPU and the GPU. [5] [6]
Several operating systems support zero-copying of user data and file contents through specific APIs.
Here are listed only a few well known system calls / APIs available in most popular OSs.
Novell NetWare supports a form of zero-copy through Event Control Blocks (ECBs), see NCOPY.
The internal COPY command in some versions of DR-DOS since 1992 initiates this as well when COMMAND.COM detects that the files to be copied are stored on a NetWare file server, [7] otherwise it falls back to normal file copying. The external MOVE command since DR DOS 6.0 (1991) and MS-DOS 6.0 (1993) internally performs a RENAME (causing just the directory entries to be modified in the file system instead of physically copying the file data) when the source and destination are located on the same logical volume. [8]
The Linux kernel supports zero-copy through various system calls, such as:
Some of them are specified in POSIX and thus also present in the BSD kernels or IBM AIX, some are unique to the Linux kernel API.
FreeBSD, NetBSD, OpenBSD, DragonFly BSD, etc. support zero-copy through at least these system calls:
MacOS should support zero-copy through the FreeBSD portion of the kernel because it offers the same system calls (and its manual pages are still tagged BSD) such as:
Oracle Solaris supports zero-copy through at least these system calls:
Microsoft Windows supports zero-copy through at least this system call:
Java input streams can support zero-copy through the java.nio.channels.FileChannel's transferTo() method if the underlying operating system also supports zero copy. [28]
RDMA (Remote Direct Memory Access) protocols deeply rely on zero-copy techniques.
In computing, a core dump, memory dump, crash dump, storage dump, system dump, or ABEND dump consists of the recorded state of the working memory of a computer program at a specific time, generally when the program has crashed or otherwise terminated abnormally. In practice, other key pieces of program state are usually dumped at the same time, including the processor registers, which may include the program counter and stack pointer, memory management information, and other processor and operating system flags and information. A snapshot dump is a memory dump requested by the computer operator or by the running program, after which the program is able to continue. Core dumps are often used to assist in diagnosing and debugging errors in computer programs.
DragonFly BSD is a free and open-source Unix-like operating system forked from FreeBSD 4.8. Matthew Dillon, an Amiga developer in the late 1980s and early 1990s and FreeBSD developer between 1994 and 2003, began working on DragonFly BSD in June 2003 and announced it on the FreeBSD mailing lists on 16 July 2003.
chroot
is an operation on Unix and Unix-like operating systems that changes the apparent root directory for the current running process and its children. A program that is run in such a modified environment cannot name files outside the designated directory tree. The term "chroot" may refer to the chroot(2) system call or the chroot(8) wrapper program. The modified environment is called a chroot jail.
In Unix and Unix-like computer operating systems, a file descriptor is a process-unique identifier (handle) for a file or other input/output resource, such as a pipe or network socket.
Address space layout randomization (ASLR) is a computer security technique involved in preventing exploitation of memory corruption vulnerabilities. In order to prevent an attacker from reliably redirecting code execution to, for example, a particular exploited function in memory, ASLR randomly arranges the address space positions of key data areas of a process, including the base of the executable and the positions of the stack, heap and libraries.
These tables provide a comparison of operating systems, of computer devices, as listing general and technical information for a number of widely used and currently available PC or handheld operating systems. The article "Usage share of operating systems" provides a broader, and more general, comparison of operating systems that includes servers, mainframes and supercomputers.
DTrace is a comprehensive dynamic tracing framework originally created by Sun Microsystems for troubleshooting kernel and application problems on production systems in real time. Originally developed for Solaris, it has since been released under the free Common Development and Distribution License (CDDL) in OpenSolaris and its descendant illumos, and has been ported to several other Unix-like systems.
rm
is a basic command on Unix and Unix-like operating systems used to remove objects such as computer files, directories and symbolic links from file systems and also special files such as device nodes, pipes and sockets, similar to the del
command in MS-DOS, OS/2, and Microsoft Windows. The command is also available in the EFI shell.
The proc filesystem (procfs) is a special filesystem in Unix-like operating systems that presents information about processes and other system information in a hierarchical file-like structure, providing a more convenient and standardized method for dynamically accessing process data held in the kernel than traditional tracing methods or direct access to kernel memory. Typically, it is mapped to a mount point named /proc at boot time. The proc file system acts as an interface to internal data structures about running processes in the kernel. In Linux, it can also be used to obtain information about the kernel and to change certain kernel parameters at runtime (sysctl).
File attributes are a type of meta-data that describe and may modify how files and/or directories in a filesystem behave. Typical file attributes may, for example, indicate or specify whether a file is visible, modifiable, compressed, or encrypted. The availability of most file attributes depends on support by the underlying filesystem where attribute data must be stored along with other control structures. Each attribute can have one of two states: set and cleared. Attributes are considered distinct from other metadata, such as dates and times, filename extensions or file system permissions. In addition to files, folders, volumes and other file system objects may have attributes.
In computing, the sticky bit is a user ownership access right flag that can be assigned to files and directories on Unix-like systems.
In computing, preemption is the act of temporarily interrupting an executing task, with the intention of resuming it at a later time. This interrupt is done by an external scheduler with no assistance or cooperation from the task. This preemptive scheduler usually runs in the most privileged protection ring, meaning that interruption and then resumption are considered highly secure actions. Such changes to the currently executing task of a processor are known as context switching.
OS-level virtualization is an operating system (OS) virtualization paradigm in which the kernel allows the existence of multiple isolated user space instances, called containers, zones, virtual private servers (OpenVZ), partitions, virtual environments (VEs), virtual kernels, or jails. Such instances may look like real computers from the point of view of programs running in them. A computer program running on an ordinary operating system can see all resources of that computer. However, programs running inside of a container can only see the container's contents and devices assigned to the container.
In computer security, executable-space protection marks memory regions as non-executable, such that an attempt to execute machine code in these regions will cause an exception. It makes use of hardware features such as the NX bit, or in some cases software emulation of those features. However, technologies that emulate or supply an NX bit will usually impose a measurable overhead while using a hardware-supplied NX bit imposes no measurable overhead.
Wireless network cards for computers require control software to make them function. This is a list of the status of some open-source drivers for 802.11 wireless network cards.
These tables compare free software / open-source operating systems. Where not all of the versions support a feature, the first version which supports it is listed.
The following tables compare general and technical information for a number of file systems.
In Unix-like operating systems, a device file, device node, or special file is an interface to a device driver that appears in a file system as if it were an ordinary file. There are also special files in DOS, OS/2, and Windows. These special files allow an application program to interact with a device by using its device driver via standard input/output system calls. Using standard system calls simplifies many programming tasks, and leads to consistent user-space I/O mechanisms regardless of device features and functions.
epoll
is a Linux kernel system call for a scalable I/O event notification mechanism, first introduced in version 2.5.45 of the Linux kernel. Its function is to monitor multiple file descriptors to see whether I/O is possible on any of them. It is meant to replace the older POSIX select(2)
and poll(2)
system calls, to achieve better performance in more demanding applications, where the number of watched file descriptors is large (unlike the older system calls, which operate in O(n) time, epoll
operates in O(1) time).
{{cite journal}}
: Cite journal requires |journal=
(help){{cite book}}
: |work=
ignored (help) (NB. NWDOSTIP.TXT is a comprehensive work on Novell DOS 7 and OpenDOS 7.01, including the description of many undocumented features and internals. It is part of the author's yet larger MPDOSTIP.ZIP
collection maintained up to 2001 and distributed on many sites at the time. The provided link points to a HTML-converted older version of the NWDOSTIP.TXT
file.)