Mmap

Last updated November 12, 2024 • 3 min readFrom Wikipedia, The Free Encyclopedia

In computing, mmap(2) is a POSIX-compliant Unix system call that maps files or devices into memory. It is a method of memory-mapped file I/O. It implements demand paging because file contents are not immediately read from disk and initially use no physical RAM at all. The actual reads from disk are performed after a specific location is accessed, in a lazy manner. After the mapping is no longer needed, the pointers must be unmapped with munmap(2). Protection information—for example, marking mapped regions as executable—can be managed using mprotect(2), and special treatment can be enforced using madvise(2).

History

The original design of memory-mapped files came from the TOPS-20 operating system. mmap and associated systems calls were designed as part of the Berkeley Software Distribution (BSD) version of Unix. Their API was already described in the 4.2BSD System Manual, even though it was neither implemented in that release, nor in 4.3BSD.^[1] Sun Microsystems had implemented this very API, though, in their SunOS operating system. The BSD developers at University of California, Berkeley unsuccessfully requested Sun to donate its implementation; 4.3BSD-Reno was instead shipped with an implementation based on the virtual memory system of Mach.^[2]

File-backed and anonymous

File-backed mapping maps an area of the process's virtual memory to files; that is, reading those areas of memory causes the file to be read. It is the default mapping type.

Anonymous mapping maps an area of the process's virtual memory not backed by any file, made available via the MAP_ANONYMOUS/MAP_ANON flags. The contents are initialized to zero.^[3] In this respect an anonymous mapping is similar to malloc, and is used in some malloc implementations for certain allocations, particularly large ones.

Memory visibility

If the mapping is shared (the MAP_SHARED flag is set), then it is preserved when a process is forked (using a fork(2) system call). Therefore, writes to a mapped area in one process are immediately visible in all related (parent, child or sibling) processes. If the mapping is shared and backed by a file (not MAP_ANONYMOUS) the underlying file medium is only guaranteed to be written after it is passed to the msync(2) system call. In contrast, if the mapping is private (the MAP_PRIVATE flag is set), the changes will neither be seen by other processes nor written to the file.

A process reading from, or writing to, the underlying file will not always see the same data as a different process that has mapped the file, since segments of the file are copied into RAM and only periodically flushed to disk. Synchronization can be forced with a call to msync(2).

Using mmap on files can significantly reduce memory overhead for applications accessing the same file; they can share the memory area the file encompasses, instead of loading the file for each application that wants access to it. This means that mmap(2) is sometimes used for Interprocess Communication (IPC). On modern operating systems, mmap(2) is typically preferred to the System V IPC Shared Memory facility.^[4]

The main difference between System V shared memory (shmem) and memory mapped I/O (mmap) is that System V shared memory is persistent: unless explicitly removed by a process, it is kept in memory and remains available until the system is shut down. mmap'd memory is not persistent between application executions (unless it is backed by a file).

Example of usage under the C programming language

#include<sys/types.h>#include<sys/mman.h>#include<err.h>#include<fcntl.h>#include<stdio.h>#include<stdlib.h>#include<string.h>#include<unistd.h>/* This example shows how an mmap of /dev/zero is equivalent to   using anonymous memory (MAP_ANON) not connected to any file.   N.B. MAP_ANONYMOUS or MAP_ANON are supported by most UNIX   versions, removing the original purpose of /dev/zero.*//* Does not work on OS X or macOS, where you can't mmap over /dev/zero */intmain(void){constcharstr1[]="string 1";constcharstr2[]="string 2";pid_tparpid=getpid(),childpid;intfd=-1;char*anon,*zero;if((fd=open("/dev/zero",O_RDWR,0))==-1)err(1,"open");anon=(char*)mmap(NULL,4096,PROT_READ|PROT_WRITE,MAP_ANON|MAP_SHARED,-1,0);zero=(char*)mmap(NULL,4096,PROT_READ|PROT_WRITE,MAP_SHARED,fd,0);if(anon==MAP_FAILED||zero==MAP_FAILED)errx(1,"either mmap");strcpy(anon,str1);strcpy(zero,str1);printf("PID %d:\tanonymous %s, zero-backed %s\n",parpid,anon,zero);switch((childpid=fork())){case-1:err(1,"fork");/* NOTREACHED */case0:childpid=getpid();printf("PID %d:\tanonymous %s, zero-backed %s\n",childpid,anon,zero);sleep(3);printf("PID %d:\tanonymous %s, zero-backed %s\n",childpid,anon,zero);munmap(anon,4096);munmap(zero,4096);close(fd);returnEXIT_SUCCESS;}sleep(2);strcpy(anon,str2);strcpy(zero,str2);printf("PID %d:\tanonymous %s, zero-backed %s\n",parpid,anon,zero);munmap(anon,4096);munmap(zero,4096);close(fd);returnEXIT_SUCCESS;}

sample output:

PID 22475:      anonymous string 1, zero-backed string 1 PID 22476:      anonymous string 1, zero-backed string 1 PID 22475:      anonymous string 2, zero-backed string 2 PID 22476:      anonymous string 2, zero-backed string 2

Usage in database implementations

The mmap system call has been used in various database implementations as an alternative for implementing a buffer pool, although this created a different set of problems that could realistically only be fixed using a buffer pool.^[5]

Related Research Articles

<span class="mw-page-title-main">AWK</span> Programming language

AWK is a domain-specific language designed for text processing and typically used as a data extraction and reporting tool. Like sed and grep, it is a filter, and is a standard feature of most Unix-like operating systems.

The GNU Debugger (GDB) is a portable debugger that runs on many Unix-like systems and works for many programming languages, including Ada, Assembly, C, C++, D, Fortran, Haskell, Go, Objective-C, OpenCL C, Modula-2, Pascal, Rust, and partially others.

A Berkeley (BSD) socket is an application programming interface (API) for Internet domain sockets and Unix domain sockets, used for inter-process communication (IPC). It is commonly implemented as a library of linkable modules. It originated with the 4.2BSD Unix operating system, which was released in 1983.

In computing, particularly in the context of the Unix operating system and its workalikes, fork is an operation whereby a process creates a copy of itself. It is an interface which is required for compliance with the POSIX and Single UNIX Specification standards. It is usually implemented as a C standard library wrapper to the fork, clone, or other system calls of the kernel. Fork is the primary method of process creation on Unix-like operating systems.

The C standard library, sometimes referred to as libc, is the standard library for the C programming language, as specified in the ISO C standard. Starting from the original ANSI C standard, it was developed at the same time as the C library POSIX specification, which is a superset of it. Since ANSI C was adopted by the International Organization for Standardization, the C standard library is also called the ISO C library.

C dynamic memory allocation refers to performing manual memory management for dynamic memory allocation in the C programming language via a group of functions in the C standard library, namely malloc, realloc, calloc, aligned_alloc and free.

/dev/zero is a special file in Unix-like operating systems that provides as many null characters as are read from it. One of the typical uses is to provide a character stream for initializing data storage.

In Unix and Unix-like computer operating systems, a file descriptor is a process-unique identifier (handle) for a file or other input/output resource, such as a pipe or network socket.

Address space layout randomization (ASLR) is a computer security technique involved in preventing exploitation of memory corruption vulnerabilities. In order to prevent an attacker from reliably redirecting code execution to, for example, a particular exploited function in memory, ASLR randomly arranges the address space positions of key data areas of a process, including the base of the executable and the positions of the stack, heap and libraries.

stat is a Unix system call that returns file attributes about an inode. The semantics of stat vary between operating systems. As an example, Unix command ls uses this system call to retrieve information on files that includes:

The proc filesystem (procfs) is a special filesystem in Unix-like operating systems that presents information about processes and other system information in a hierarchical file-like structure, providing a more convenient and standardized method for dynamically accessing process data held in the kernel than traditional tracing methods or direct access to kernel memory. Typically, it is mapped to a mount point named /proc at boot time. The proc file system acts as an interface to internal data structures about running processes in the kernel. In Linux, it can also be used to obtain information about the kernel and to change certain kernel parameters at runtime (sysctl).

The OpenBSD operating system focuses on security and the development of security features. According to author Michael W. Lucas, OpenBSD "is widely regarded as the most secure operating system available anywhere, under any licensing terms."

In computing, exec is a functionality of an operating system that runs an executable file in the context of an already existing process, replacing the previous executable. This act is also referred to as an overlay. It is especially important in Unix-like systems, although it also exists elsewhere. As no new process is created, the process identifier (PID) does not change, but the machine code, data, heap, and stack of the process are replaced by those of the new program.

In client-server computing, a Unix domain socket is a Berkeley socket that allows data to be exchanged between two processes executing on the same Unix or Unix-like host computer. This is similar to an Internet domain socket that allows data to be exchanged between two processes executing on different host computers.

A memory-mapped file is a segment of virtual memory that has been assigned a direct byte-for-byte correlation with some portion of a file or file-like resource. This resource is typically a file that is physically present on disk, but can also be a device, shared memory object, or other resource that an operating system can reference through a file descriptor. Once present, this correlation between the file and the memory space permits applications to treat the mapped portion as if it were primary memory.

select is a system call and application programming interface (API) in Unix-like and POSIX-compliant operating systems for examining the status of file descriptors of open input/output channels. The select system call is similar to the poll facility introduced in UNIX System V and later operating systems. However, with the c10k problem, both select and poll have been superseded by the likes of kqueue, epoll, /dev/poll and I/O completion ports.

Java Native Access (JNA) is a community-developed library that provides Java programs easy access to native shared libraries without using the Java Native Interface (JNI). JNA's design aims to provide native access in a natural way with a minimum of effort. Unlike JNI, no boilerplate or generated glue code is required.

Getopt is a C library function used to parse command-line options of the Unix/POSIX style. It is a part of the POSIX specification, and is universal to Unix-like systems. It is also the name of a Unix program for parsing command line arguments in shell scripts.

The C programming language has a set of functions implementing operations on strings in its standard library. Various operations, such as copying, concatenation, tokenization and searching are supported. For character strings, the standard library uses the convention that strings are null-terminated: a string of $n$ characters is represented as an array of $n + 1$ elements, the last of which is a "NUL character" with numeric value 0.

In computer science, shared memory is memory that may be simultaneously accessed by multiple programs with an intent to provide communication among them or avoid redundant copies. Shared memory is an efficient means of passing data between programs. Depending on context, programs may run on a single processor or on multiple separate processors.

References

↑ William Joy; Eric Cooper; Robert Fabry; Samuel Leffler; Kirk McKusick; David Mosher (1983). 4.2BSD System Manual (PDF) (Report). Computer Systems Research Group, University of California, Berkeley.
↑ McKusick, Marshall Kirk (1999). "Twenty Years of Berkeley Unix: From AT&T-Owned to Freely Redistributable". Open Sources: Voices from the Open Source Revolution. O'Reilly.
↑ "mmap(2) - The Open Group Base Specifications Issue 8".
↑ Kerrisk, Michael (2010). The Linux programming interface : a Linux and UNIX system programming handbook. San Francisco: No Starch Press. p. 1116. ISBN 978-1-59327-291-3. OCLC 728672600.
↑ "Are You Sure You Want to Use MMAP in Your Database Management System?". db.cs.cmu.edu. Retrieved 2023-07-04.

v t e Inter-process communication
Data exchange among threads in computer programs
Methods	File Memory-mapped file Message passing Message queue and mailbox Named pipe Anonymous pipe Pipe Semaphore Shared memory Signal Sockets Network Unix
Protocols and standards	Apple events COM+ CORBA D-Bus DDS DCE ICE OpenBinder Sun RPC POSIX (various methods) SOAP REST Thrift TIPC XML-RPC
Software libraries and frameworks	D-Bus libevent SIMPL LINX