Maildir

Last updated

The Maildir e-mail format is a common way of storing email messages on a file system, rather than in a database. Each message is assigned a file with a unique name, and each mail folder is a file system directory containing these files. Maildir was designed by Daniel J. Bernstein circa 1995, with a major goal of eliminating the need for program code to handle file locking and unlocking through use of the local filesystem. [1] Maildir design reflects the fact that the only operations valid for an email message is that it be created, deleted or have its status changed in some way.

Contents

Internal structure Maildir.png
Internal structure

Specifications

A Maildir directory (often named Maildir) usually has three subdirectories named tmp, new, and cur. [2]

Maildir++

Sam Varshavchik, the author of the Courier Mail Server and other software, defined the Maildir++ extension [3] [4] to the Maildir format to support subfolders and mail quotas. Maildir++ directories contain subdirectories with names that start with a '.' (dot) which are also Maildir++ folders. The extension complies with the original Maildir specification, which allows for subdirectories in addition to tmp, new and cur.

Technical operation

A mail delivery agent is a program that delivers an email message into a Maildir. The mail delivery agent creates a new file with a unique filename in the tmp directory. [5] [6] [3] At the time of its invention guaranteeing unique filenames efficiently was difficult. The original qmail [1] algorithm for unique names was:

  1. read the current Unix time
  2. read the current process identifier (PID)
  3. read the current hostname
  4. concatenate the above three values into a string separated by the period character; this is the new filename
  5. if stat() reports that the filename exists, then wait two seconds
  6. go to previous step until the filename does not exist
  7. create a file with the unique filename and write the message contents to the new file

By 2000, the qmail author recommended in an updated specification [5] to append the value of a per-process counter to the PID, whose value should be incremented after each delivery. The rate-limiting recommendation to "wait two seconds" was dropped.

By 2003, the recommendations had been further amended to require that instead of the PID and counter, the middle part of the filename should be created by "concatenating enough of the following strings to guarantee uniqueness" even in the face of multiple simultaneous deliveries to the same maildir from one or more processes: [7]

This 2003 algorithm was criticised [8] in 2006 as being unnecessarily complex by Timo Sirainen, the creator of Dovecot.

As of November 2023, qmail author Daniel Bernstein had made no further changes to the 2003 filename generation recommendations. [9] On modern POSIX systems, temporary files can be safely created with the mkstemp C library function.

The delivery process stores the message in the maildir by creating and writing to tmp/uniquefilename, and then moving this file to new/uniquefilename. The moving can be done using rename , which is atomic in many systems. [10] Alternatively, it can be done by hard-linking the file to new and then unlinking the file from tmp. Any leftover file will eventually be deleted. This sequence guarantees that a maildir-reading program will not see a partially written message. There can be multiple programs reading a maildir at the same time. They range from mail user agents (MUAs), which access the server's file system directly, through Internet Message Access Protocol or Post Office Protocol servers acting on behalf of remote MUAs, to utilities such as biff and rsync, which may or may not be aware of the maildir structure. Readers should never look in tmp.

When a cognizant maildir-reading process (either a POP or IMAP server, or a mail user agent acting locally) finds messages in the new directory, it must move them to cur. It is just a means to notify the user "you have X new messages". [11] This moving needs to be done using the atomic filesystem rename(), as the alternative link-then-unlink technique is non-atomic and may result in duplicated messages. An informational suffix is appended to filenames at this stage. It consists of a colon (to separate the unique part of the filename from the actual information), a "2", a comma and various flags. The "2" specifies the version of the information that follows the comma. "2" is the only currently officially specified version, "1" being an experimental version. The specification defines flags that show whether the message has been read, deleted and so on: the initial (capital) letter of "Passed", "Replied", "Seen", "Trashed", "Draft", and "Flagged". [7] Applications often choose to supplement this very limited set of flags, for example notmuch [12] offers flag synchronization in addition to arbitrary user-defined flags, [13] while Dovecot uses lowercase letters to match 26 IMAP keywords, [6] which may include keywords such as $MDNSent or user-defined flags.

Although Maildir was intended to allow lockless usage, in practice some software that uses Maildirs also uses locks, such as Dovecot. [14]

File-system compatibility issues

The Maildir standard can only be implemented on systems that accept colons in filenames. [15]

Systems that don't allow colons in filenames (this includes Microsoft Windows and some configurations of Novell Storage Services) can use a non-standard alternative separator, such as ";" or "-". It is often trivial to patch free and open-source software to use a different separator. [16]

As there is currently no agreement on what character this alternative separator should be, there can be interoperability difficulties between different Maildir-supporting programs on these systems. However, not all Maildir-related software needs to know what the separator character is, because not all Maildir-related software needs to be able to read or modify the flags of a message ("read", "replied to" etc.); software that merely delivers to a Maildir or archives old messages from it based only on date, should work no matter what separator is in use. If only the MUA needs to read or modify message flags, and only one MUA is used, then non-standard alternative separators may be used without interoperability problems.

Software that supports Maildir directly

Mail servers

Delivery agents

Mail readers

Notes and references

  1. 1 2 Bernstein, Daniel J. (1995). "maildir(5)". Archived from the original on 1997-10-12. Retrieved 2018-11-23.
  2. Blum, Richard (2001). Postfix. Sams Publishing. ISBN   978-0-672-32114-6.
  3. 1 2 3 Varshavchik, Sam (2009). "maildir(5)". Archived from the original on 2024-04-17. Retrieved 2024-08-09.
  4. Varshavchik, Sam (2011). "Maildir++". Archived from the original on 2024-05-29. Retrieved 2024-08-09.
  5. 1 2 Bernstein., Daniel J. (c. 2000) [First published 2000 or earlier]. "Using maildir format". Archived from the original on 2000-09-02. Retrieved 2018-11-23.
  6. 1 2 Dovecot Wiki: maildir format
  7. 1 2 Bernstein., Daniel J. (2003) [The earliest version of this document was first published in 2000 or earlier]. "Using maildir format". Archived from the original on 2003-04-01. Retrieved 2018-11-23.
  8. Sirainen, Timo. "Maildir Mailbox Format: Mail Delivery'". Archived from the original on 2024-06-24. Retrieved 2024-08-09. All this trouble is rather pointless. Only the first step is what really guarantees that the mails won't get overwritten, the rest just sounds nice. Even though they might catch a problem once in a while, they give no guaranteed protection and will just as easily pass duplicate filenames through and overwrite existing mails.¶ Step 2 is pointless because there's a race condition between steps 2 and 3. PID/host combination by itself should already guarantee that it never finds such a file. If it does, something's broken and the stat() check won't help since another process might be doing the same thing at the same time, and you end up writing to the same file in tmp/, causing the mail to get corrupted.¶ In step 4 the link() would fail if an identical file already existed in the Maildir, right? Wrong. The file may already have been moved to cur/ directory, and since it may contain any number of flags by then you can't check with a simple stat() anymore if it exists or not.¶ Step 2 was pointed out to be useful if clock had moved backwards. However, this doesn't give any actual safety guarantees because an identical base filename could already exist in cur/. Besides if the system was just rebooted, the file in tmp/ could probably be even overwritten safely (assuming it wasn't already link()ed to new/).¶ So really, all that's important in not getting mails overwritten in your Maildir is step 1: Always create filenames that are guaranteed to be unique. Forget about the 2 second waits and such that the Qmail's man page talks about.
  9. "Wayback Machine snapshots of cr.yp.to/proto/maildir.html". Internet Archive. 2023. Retrieved 2023-11-23.
  10. "rename". The Open Group. 2013. Retrieved 23 July 2016. That specification requires that the action of the function be atomic.
  11. Sam Varshavchik (25 July 2016). "Management of maildir structures". courier-users (Mailing list). Retrieved 26 July 2016.
  12. "Notmuch mail system homepage". notmuchmail.org. Retrieved 2019-06-22.
  13. "notmuch 0.38.3 documentation". notmuch-config. Retrieved 2024-04-17.
  14. Sirainen, Timo. "Maildir Mailbox Format: Locking". Archived from the original on 2024-06-24. Retrieved 2024-08-09.
  15. "mailbox — Manipulate mailboxes in various formats". Python documentation. Retrieved 2023-06-19.
  16. mutt maildir support: workaround for filesystems that don't accept colons
  17. "aerc - the world's best email client homepage". aerc-mail.org.
  18. "Notmuch mail system homepage". notmuchmail.org. Retrieved 2019-06-22.
  19. "Maildir in Thunderbird". mozilla.org. Retrieved 2020-12-06.

See also

Related Research Articles

<span class="mw-page-title-main">Email</span> Mail sent using electronic means

Email is a method of transmitting and receiving messages using electronic devices. It was conceived in the late–20th century as the digital version of, or counterpart to, mail. Email is a ubiquitous and very widely used communication medium; in current use, an email address is often treated as a basic and necessary part of many processes in business, commerce, government, education, entertainment, and other spheres of daily life in most countries.

In computing, the Internet Message Access Protocol (IMAP) is an Internet standard protocol used by email clients to retrieve email messages from a mail server over a TCP/IP connection. IMAP is defined by RFC 9051.

Within the Internet email system, a message transfer agent (MTA), mail transfer agent, or mail relay is software that transfers electronic mail messages from one computer to another using the Simple Mail Transfer Protocol. In some contexts, the alternative names mail server, mail exchanger, or MX host are used to describe an MTA.

The Simple Mail Transfer Protocol (SMTP) is an Internet standard communication protocol for electronic mail transmission. Mail servers and other message transfer agents use SMTP to send and receive mail messages. User-level email clients typically use SMTP only for sending messages to a mail server for relaying, and typically submit outgoing email to the mail server on port 587 or 465 per RFC 8314. For retrieving messages, IMAP is standard, but proprietary servers also often implement proprietary protocols, e.g., Exchange ActiveSync.

<span class="mw-page-title-main">Email client</span> Computer program used to access and manage a users email

An email client, email reader or, more formally, message user agent (MUA) or mail user agent is a computer program used to access and manage a user's email.

Mbox is a generic term for a family of related file formats used for holding collections of email messages. It was first implemented in Fifth Edition Unix.

qmail is a mail transfer agent (MTA) that runs on Unix. It was written, starting December 1995, by Daniel J. Bernstein as a more secure alternative to the popular Sendmail program. Originally license-free software, qmail's source code was later dedicated to the public domain by the author.

procmail is an email server software component — specifically, a message delivery agent (MDA). It was one of the earliest mail filter programs. It is typically used in Unix-like mail systems, using the mbox and Maildir storage formats.

An email address identifies an email box to which messages are delivered. While early messaging systems used a variety of formats for addressing, today, email addresses follow a set of specific rules originally standardized by the Internet Engineering Task Force (IETF) in the 1980s, and updated by RFC 5322 and 6854. The term email address in this article refers to just the addr-spec in Section 3.4 of RFC 5322. The RFC defines address more broadly as either a mailbox or group. A mailbox value can be either a name-addr, which contains a display-name and addr-spec, or the more common addr-spec alone.

<span class="mw-page-title-main">Mutt (email client)</span> Text-based email client for Unix-like systems

Mutt is a text-based email client for Unix-like systems. It was originally written by Michael Elkins in 1995 and released under the GNU General Public License version 2 or any later version.

The MH Message Handling System is a free, open source e-mail client. It is different from almost all other mail reading systems in that, instead of a single program, it is made from several different programs which are designed to work from the command line provided by the shell on Unix-like operating systems. Another difference is that rather than storing multiple messages in a single file, messages each have their own separate file in a special directory. Taken together, these design choices mean that it is very easy and natural to script actions on mail messages using the normal shell scripting tools.

The Cyrus IMAP server is electronic mail server software developed by Carnegie Mellon University. It differs from other Internet Message Access Protocol (IMAP) server implementations in that it is generally intended to be run on sealed servers, where normal users cannot log in.

<span class="mw-page-title-main">Dovecot (software)</span>

Dovecot is an open-source IMAP and POP3 server for Unix-like operating systems, written primarily with security in mind. Timo Sirainen originated Dovecot and first released it in July 2002. Dovecot developers primarily aim to produce a lightweight, fast and easy-to-set-up open-source email server.

The UW IMAP server was the reference server implementation of the Internet Message Access Protocol. It was developed at the University of Washington by Mark Crispin and others.

The comparison of mail servers covers mail transfer agents (MTAs), mail delivery agents, and other computer software that provide e-mail services.

The Courier Mail Server is a mail transfer agent (MTA) server that provides SMTP, IMAP, POP3, SMAP, webmail, and mailing list services with individual components. It is best known for its IMAP server component.

A mailbox is the destination to which electronic mail messages are delivered. It is the equivalent of a letter box in the postal system.

MIX is a high-performance, indexed, on-disk email storage system that is designed for use with the IMAP protocol. MIX was designed by Mark Crispin, the author of the IMAP protocol. Server support for it has been included in releases of UW IMAP since 2006, Panda IMAP, and Messaging Architects Netmail. MIX is also supported directly by the Alpine e-mail client.

<span class="mw-page-title-main">Email agent (infrastructure)</span>

An e-mail agent is a program that is part of the e-mail infrastructure, from composition by sender, to transfer across the network, to viewing by recipient. The best-known are message user agents and message transfer agents, but finer divisions exist.