Mmap

Last updated

In computing, mmap(2) is a POSIX-compliant Unix system call that maps files or devices into memory. It is a method of memory-mapped file I/O. It implements demand paging because file contents are not immediately read from disk and initially use no physical RAM at all. The actual reads from disk are performed after a specific location is accessed, in a lazy manner. After the mapping is no longer needed, the pointers must be unmapped with munmap(2). Protection information—for example, marking mapped regions as executable—can be managed using mprotect(2), and special treatment can be enforced using madvise(2).

Contents

In Linux, macOS and the BSDs, mmap can create several types of mappings. Other operating systems may only support a subset of these; for example, shared mappings may not be practical in an operating system without a global VFS or I/O cache.

History

The original design of memory-mapped files came from the TOPS-20 operating system. mmap and associated systems calls were designed as part of the Berkeley Software Distribution (BSD) version of Unix. Their API was already described in the 4.2BSD System Manual, even though it was neither implemented in that release, nor in 4.3BSD. [1] Sun Microsystems had implemented this very API, though, in their SunOS operating system. The BSD developers at University of California, Berkeley unsuccessfully requested Sun to donate its implementation; 4.3BSD-Reno was instead shipped with an implementation based on the virtual memory system of Mach. [2]

File-backed and anonymous

File-backed mapping maps an area of the process's virtual memory to files; that is, reading those areas of memory causes the file to be read. It is the default mapping type.

Anonymous mapping maps an area of the process's virtual memory not backed by any file. The contents are initialized to zero. [3] In this respect an anonymous mapping is similar to malloc, and is used in some malloc implementations for certain allocations, particularly large ones. Anonymous mappings are not part of the POSIX standard, but are implemented in almost all operating systems by the MAP_ANONYMOUS and MAP_ANON flags.

Memory visibility

If the mapping is shared (the MAP_SHARED flag is set), then it is preserved when a process is forked (using a fork(2) system call). Therefore, writes to a mapped area in one process are immediately visible in all related (parent, child or sibling) processes. If the mapping is shared and backed by a file (not MAP_ANONYMOUS) the underlying file medium is only guaranteed to be written after it is passed to the msync(2) system call. In contrast, if the mapping is private (the MAP_PRIVATE flag is set), the changes will neither be seen by other processes nor written to the file.

A process reading from, or writing to, the underlying file will not always see the same data as a different process that has mapped the file, since segments of the file are copied into RAM and only periodically flushed to disk. Synchronization can be forced with a call to msync(2).

Using mmap on files can significantly reduce memory overhead for applications accessing the same file; they can share the memory area the file encompasses, instead of loading the file for each application that wants access to it. This means that mmap(2) is sometimes used for Interprocess Communication (IPC). On modern operating systems, mmap(2) is typically preferred to the System V IPC Shared Memory facility. [4]

The main difference between System V shared memory (shmem) and memory mapped I/O (mmap) is that System V shared memory is persistent: unless explicitly removed by a process, it is kept in memory and remains available until the system is shut down. mmap'd memory is not persistent between application executions (unless it is backed by a file).

Example of usage under the C programming language

#include<sys/types.h>#include<sys/mman.h>#include<err.h>#include<fcntl.h>#include<stdio.h>#include<stdlib.h>#include<string.h>#include<unistd.h>/* This example shows how an mmap of /dev/zero is equivalent to   using anonymous memory (MAP_ANON) not connected to any file.   N.B. MAP_ANONYMOUS or MAP_ANON are supported by most UNIX   versions, removing the original purpose of /dev/zero.*//* Does not work on OS X or macOS, where you can't mmap over /dev/zero */intmain(void){constcharstr1[]="string 1";constcharstr2[]="string 2";pid_tparpid=getpid(),childpid;intfd=-1;char*anon,*zero;if((fd=open("/dev/zero",O_RDWR,0))==-1)err(1,"open");anon=(char*)mmap(NULL,4096,PROT_READ|PROT_WRITE,MAP_ANON|MAP_SHARED,-1,0);zero=(char*)mmap(NULL,4096,PROT_READ|PROT_WRITE,MAP_SHARED,fd,0);if(anon==MAP_FAILED||zero==MAP_FAILED)errx(1,"either mmap");strcpy(anon,str1);strcpy(zero,str1);printf("PID %d:\tanonymous %s, zero-backed %s\n",parpid,anon,zero);switch((childpid=fork())){case-1:err(1,"fork");/* NOTREACHED */case0:childpid=getpid();printf("PID %d:\tanonymous %s, zero-backed %s\n",childpid,anon,zero);sleep(3);printf("PID %d:\tanonymous %s, zero-backed %s\n",childpid,anon,zero);munmap(anon,4096);munmap(zero,4096);close(fd);returnEXIT_SUCCESS;}sleep(2);strcpy(anon,str2);strcpy(zero,str2);printf("PID %d:\tanonymous %s, zero-backed %s\n",parpid,anon,zero);munmap(anon,4096);munmap(zero,4096);close(fd);returnEXIT_SUCCESS;}

sample output:

PID 22475:      anonymous string 1, zero-backed string 1 PID 22476:      anonymous string 1, zero-backed string 1 PID 22475:      anonymous string 2, zero-backed string 2 PID 22476:      anonymous string 2, zero-backed string 2 

Usage in database implementations

The mmap system call has been used in various database implementations as an alternative for implementing a buffer pool, although this created a different set of problems that could realistically only be fixed using a buffer pool. [5]

See also

Related Research Articles

<span class="mw-page-title-main">AWK</span> Programming language

AWK is a domain-specific language designed for text processing and typically used as a data extraction and reporting tool. Like sed and grep, it is a filter, and is a standard feature of most Unix-like operating systems.

Berkeley sockets is an application programming interface (API) for Internet sockets and Unix domain sockets, used for inter-process communication (IPC). It is commonly implemented as a library of linkable modules. It originated with the 4.2BSD Unix operating system, which was released in 1983.

In computing, particularly in the context of the Unix operating system and its workalikes, fork is an operation whereby a process creates a copy of itself. It is an interface which is required for compliance with the POSIX and Single UNIX Specification standards. It is usually implemented as a C standard library wrapper to the fork, clone, or other system calls of the kernel. Fork is the primary method of process creation on Unix-like operating systems.

The C standard library or libc is the standard library for the C programming language, as specified in the ISO C standard. Starting from the original ANSI C standard, it was developed at the same time as the C library POSIX specification, which is a superset of it. Since ANSI C was adopted by the International Organization for Standardization, the C standard library is also called the ISO C library.

C dynamic memory allocation refers to performing manual memory management for dynamic memory allocation in the C programming language via a group of functions in the C standard library, namely malloc, realloc, calloc, aligned_alloc and free.

/dev/zero is a special file in Unix-like operating systems that provides as many null characters as are read from it. One of the typical uses is to provide a character stream for initializing data storage.

In Unix and Unix-like computer operating systems, a file descriptor is a process-unique identifier (handle) for a file or other input/output resource, such as a pipe or network socket.

The seven standard Unix file types are regular, directory, symbolic link, FIFO special, block special, character special, and socket as defined by POSIX. Different OS-specific implementations allow more types than what POSIX requires. A file's type can be identified by the ls -l command, which displays the type in the first character of the file-system permissions field.

The proc filesystem (procfs) is a special filesystem in Unix-like operating systems that presents information about processes and other system information in a hierarchical file-like structure, providing a more convenient and standardized method for dynamically accessing process data held in the kernel than traditional tracing methods or direct access to kernel memory. Typically, it is mapped to a mount point named /proc at boot time. The proc file system acts as an interface to internal data structures about running processes in the kernel. In Linux, it can also be used to obtain information about the kernel and to change certain kernel parameters at runtime (sysctl).

In C programming, the functions getaddrinfo and getnameinfo convert domain names, hostnames, and IP addresses between human-readable text representations and structured binary formats for the operating system's networking API. Both functions are contained in the POSIX standard application programming interface (API).

"Zero-copy" describes computer operations in which the CPU does not perform the task of copying data from one memory area to another or in which unnecessary data copies are avoided. This is frequently used to save CPU cycles and memory bandwidth in many time consuming tasks, such as when transmitting a file at high speed over a network, etc., thus improving the performance of programs (processes) executed by a computer.

A scanf format string is a control parameter used in various functions to specify the layout of an input string. The functions can then divide the string and translate into values of appropriate data types. String scanning functions are often supplied in standard libraries.Scanf is a function that reads formatted data from the standard input string, which is usually the keyboard and writes the results whenever called in the specified arguments.

In computing, exec is a functionality of an operating system that runs an executable file in the context of an already existing process, replacing the previous executable. This act is also referred to as an overlay. It is especially important in Unix-like systems, although it exists elsewhere. As no new process is created, the process identifier (PID) does not change, but the machine code, data, heap, and stack of the process are replaced by those of the new program.

A memory-mapped file is a segment of virtual memory that has been assigned a direct byte-for-byte correlation with some portion of a file or file-like resource. This resource is typically a file that is physically present on disk, but can also be a device, shared memory object, or other resource that the operating system can reference through a file descriptor. Once present, this correlation between the file and the memory space permits applications to treat the mapped portion as if it were primary memory.

select is a system call and application programming interface (API) in Unix-like and POSIX-compliant operating systems for examining the status of file descriptors of open input/output channels. The select system call is similar to the poll facility introduced in UNIX System V and later operating systems. However, with the c10k problem, both select and poll have been superseded by the likes of kqueue, epoll, /dev/poll and I/O completion ports.

Java Native Access (JNA) is a community-developed library that provides Java programs easy access to native shared libraries without using the Java Native Interface (JNI). JNA's design aims to provide native access in a natural way with a minimum of effort. Unlike JNI, no boilerplate or generated glue code is required.

Getopt is a C library function used to parse command-line options of the Unix/POSIX style. It is a part of the POSIX specification, and is universal to Unix-like systems. It is also the name of a Unix program for parsing command line arguments in shell scripts.

The C programming language has a set of functions implementing operations on strings in its standard library. Various operations, such as copying, concatenation, tokenization and searching are supported. For character strings, the standard library uses the convention that strings are null-terminated: a string of n characters is represented as an array of n + 1 elements, the last of which is a "NUL character" with numeric value 0.

<span class="mw-page-title-main">Shared memory</span> Computer memory that can be accessed by multiple processes

In computer science, shared memory is memory that may be simultaneously accessed by multiple programs with an intent to provide communication among them or avoid redundant copies. Shared memory is an efficient means of passing data between programs. Depending on context, programs may run on a single processor or on multiple separate processors.

A virtual kernel architecture (vkernel) is an operating system virtualisation paradigm where kernel code can be compiled to run in the user space, for example, to ease debugging of various kernel-level components, in addition to general-purpose virtualisation and compartmentalisation of system resources. It is used by DragonFly BSD in its vkernel implementation since DragonFly 1.7, having been first revealed in September 2006, and first released in the stable branch with DragonFly 1.8 in January 2007. The long-term goal, in addition to easing kernel development, is to make it easier to support internet-connected computer clusters without compromising local security. Similar concepts exist in other operating systems as well; in Linux, a similar virtualisation concept is known as user-mode Linux; whereas in NetBSD since the summer of 2007, it has been the initial focus of the rump kernel infrastructure.

References

  1. William Joy; Eric Cooper; Robert Fabry; Samuel Leffler; Kirk McKusick; David Mosher (1983). 4.2BSD System Manual (PDF) (Report). Computer Systems Research Group, University of California, Berkeley.
  2. McKusick, Marshall Kirk (1999). "Twenty Years of Berkeley Unix: From AT&T-Owned to Freely Redistributable". Open Sources: Voices from the Open Source Revolution. O'Reilly.
  3. "mmap(2) - Linux manual page".
  4. Kerrisk, Michael (2010). The Linux programming interface : a Linux and UNIX system programming handbook. San Francisco: No Starch Press. p. 1116. ISBN   978-1-59327-291-3. OCLC   728672600.
  5. "Are You Sure You Want to Use MMAP in Your Database Management System?". db.cs.cmu.edu. Retrieved 2023-07-04.

Further reading