Mbox

Last updated

Mbox is a generic term for a family of related file formats used for holding collections of email messages. It was first implemented in Fifth Edition Unix.

Contents

All messages in an mbox mailbox are concatenated and stored as plain text in a single file. Each message starts with the four characters "From" followed by a space (the so-called "From_ line") and the sender's email address. RFC 4155 defines that a UTC timestamp follows after another separating space character. [1]

However, as noted in the RFC, there is enormous variation between different storage systems. As a specific example, if exporting via IMAP the popular Gmail service uses - as a placeholder in lieu of the sender's address, follows this with a timestamp representing either the time the IMAP export was configured or the time of reception (whichever is more recent), and makes no attempt to escape "From -" strings which appear in the body of an email.

A format similar to mbox is the MH Message Handling System. Other systems, such as Microsoft Exchange Server and the Cyrus IMAP server, store mailboxes in centralized databases managed by the mail system and not directly accessible by individual users. The maildir mailbox format is often cited as an alternative to the mbox format for networked email storage systems.

Mail storage protocols

Unlike the Internet protocols used for the exchange of email, the format used for the storage of email has never been formally defined through the RFC standardization mechanism and has been entirely left to the developer of an email client. However, the POSIX standard defined a loose framework in conjunction with the mailx program. In 2005, the application/mbox media type was standardized as RFC 4155, which hinted that mbox stores mailbox messages in their original Internet Message (RFC 2822) format, except for the used newline character, seven-bit clean data storage, and the requirement that each newly added message is terminated with a completely empty line within the mbox database. [1] [2]

Mbox family

The mbox format uses a single blank line followed by the string 'From ' (with a space) to delimit messages; this can create ambiguities if a message contains the same sequence in the message text.

Over the years, four popular but incompatible variants arose: mboxo, mboxrd, mboxcl, and mboxcl2. The naming scheme was developed by Daniel J. Bernstein, Rahul Dhesi, and others in 1996. Each originated from a different version of Unix. mboxcl and mboxcl2 originated from the file format used by Unix System V Release 4 mail tools. mboxrd was invented by Rahul Dhesi et al. as a rationalization of mboxo and subsequently adopted by some Unix mail tools including qmail.

All these variants have the problem that the content of the message sometimes must be modified to remove ambiguities, as shown below, so that applications have to know which quoting rule has been used to perform the correct reversion, which turned out to be impractical. Using MIME and choosing a content-transfer-encoding that quotes "From_" lines in a standard-compliant fashion ensures that message content doesn't need to be changed, but only their MIME representation. Therefore, checksums remain constant, a necessary precondition for supporting S/MIME and Pretty Good Privacy. Applications that newly create messages and store them in mbox database files will likely use this approach to detach message content from database storage format.

mboxo and mboxrd locate the message start by scanning for From lines that are found before the email message headers. If a "From " string occurs at the beginning of a line in either the header or the body of a message (a mail standard violation for the former, but not for the latter), the email message must be modified before the message is stored in an mbox mailbox file or the line will be taken as a message boundary. To avoid misinterpreting a "From " string at the beginning of the line in the email body as the beginning of a new email, some systems "From-munge" [3] the message, typically by prepending a greater-than sign:

   >From my point of view...

In the mboxo format, such lines have irreversible ambiguity. [4] In the mboxo format, this can lead to corruption of the message. If a line already contained >From  at the beginning (such as in a quotation), it is unchanged when written. When subsequently read by the mail software, the leading > is erroneously removed. The mboxrd format solves this by converting From  to >From  and converting >From  to >>From , etc. The transformation is then always reversible. [5]

Example:

From MAILER-DAEMON Fri Jul  8 12:08:34 2011From:Author<author@example.com>To:Recipient<recipient@example.com>Subject:Samplemessage1 This is the body.>From (should be escaped).There are 3 lines.From MAILER-DAEMON Fri Jul  8 12:08:34 2011From:Author<author@example.com>To:Recipient<recipient@example.com>Subject:Samplemessage2 This is the second body.

The mboxcl and mboxcl2 formats use a Content-Length: header to determine the messages' lengths and thereby the next real From line. mboxcl still quotes From  lines in the messages themselves as mboxrd does, while mboxcl2 doesn't.

Modified mbox

Some email clients use a modification of the mbox format for their mail folders.

File locking

Because more than one messages are stored in a single file, some form of file locking is needed to avoid the corruption that can result from two or more processes modifying the mailbox simultaneously. This could happen if a network email delivery program delivers a new message at the same time as a mail reader is deleting an existing message.

Various mutually incompatible mechanisms have been used by different mbox formats to enable message file locking, including fcntl() and lockf(). This does not work well with network mounted file systems, such as the Network File System (NFS), which is why traditionally Unix used additional "dot lock" files, which could be created atomically even over NFS.

Mbox files should also be locked while they are being read. Otherwise, the reader may see corrupted message contents if another process is modifying the mbox at the same time, even though no actual file corruption occurs.

As a patch format

In open source development, it is common to send patches in the diff format to a mailing list for discussion. The diff format allows for irrelevant "headers", such as mbox data, to be added. [8] [9] Version control systems like git have support for generating mbox-formatted patches and for sending them to the list as emails in a thread. [10] [11]

See also

Related Research Articles

<span class="mw-page-title-main">Email</span> Mail sent using electronic means

Electronic mail is a method of transmitting and receiving messages using electronic devices. It was conceived in the late–20th century as the digital version of, or counterpart to, mail. Email is a ubiquitous and very widely used communication medium; in current use, an email address is often treated as a basic and necessary part of many processes in business, commerce, government, education, entertainment, and other spheres of daily life in most countries.

In computing, the Internet Message Access Protocol (IMAP) is an Internet standard protocol used by email clients to retrieve email messages from a mail server over a TCP/IP connection. IMAP is defined by RFC 9051.

Multipurpose Internet Mail Extensions (MIME) is a standard that extends the format of email messages to support text in character sets other than ASCII, as well as attachments of audio, video, images, and application programs. Message bodies may consist of multiple parts, and header information may be specified in non-ASCII character sets. Email messages with MIME formatting are typically transmitted with standard protocols, such as the Simple Mail Transfer Protocol (SMTP), the Post Office Protocol (POP), and the Internet Message Access Protocol (IMAP).

Within the Internet email system, a message transfer agent (MTA), mail transfer agent, or mail relay is software that transfers electronic mail messages from one computer to another using the Simple Mail Transfer Protocol. In some contexts, the alternative names mail server, mail exchanger, or MX host are used to describe an MTA.

In computing, the Post Office Protocol (POP) is an application-layer Internet standard protocol used by e-mail clients to retrieve e-mail from a mail server. Today, POP version 3 (POP3) is the most commonly used version. Together with IMAP, it is one of the most common protocols for email retrieval.

The Simple Mail Transfer Protocol (SMTP) is an Internet standard communication protocol for electronic mail transmission. Mail servers and other message transfer agents use SMTP to send and receive mail messages. User-level email clients typically use SMTP only for sending messages to a mail server for relaying, and typically submit outgoing email to the mail server on port 587 or 465 per RFC 8314. For retrieving messages, IMAP is standard, but proprietary servers also often implement proprietary protocols, e.g., Exchange ActiveSync.

<span class="mw-page-title-main">Email client</span> Computer program used to access and manage a users email

An email client, email reader or, more formally, message user agent (MUA) or mail user agent is a computer program used to access and manage a user's email.

<span class="mw-page-title-main">Maildir</span> E-mail format

The Maildir e-mail format is a common way of storing email messages on a file system, rather than in a database. Each message is assigned a file with a unique name, and each mail folder is a file system directory containing these files. Maildir was designed by Daniel J. Bernstein circa 1995, with a major goal of eliminating the need for program code to handle file locking and unlocking through use of the local filesystem. Maildir design reflects the fact that the only operations valid for an email message is that it be created, deleted or have its status changed in some way.

<span class="mw-page-title-main">Mozilla Thunderbird</span> Free and open-source email client by Mozilla

Mozilla Thunderbird is a free and open-source email client software which also functions as a full personal information manager with a calendar and contactbook, as well as an RSS feed reader, chat client (IRC/XMPP/Matrix), and news client. Available cross-platform, it is operated by the Mozilla Foundation's subsidiary MZLA Technologies Corporation. Thunderbird is an independent, community-driven project that is managed and overseen by the Thunderbird Council, which is elected by the Thunderbird Community. The project strategy was originally modeled after that of Mozilla's Firefox Web browser and is an interface built on top of that Web browser.

<span class="mw-page-title-main">Eudora (email client)</span> Defunct email client

Eudora is a family of email clients that was used on the classic Mac OS, Mac OS X, and Microsoft Windows operating systems. It also supported several palmtop computing platforms, including Newton and the Palm OS.

procmail is an email server software component — specifically, a message delivery agent (MDA). It was one of the earliest mail filter programs. It is typically used in Unix-like mail systems, using the mbox and Maildir storage formats.

<span class="mw-page-title-main">Mutt (email client)</span> Text-based email client for Unix-like systems

Mutt is a text-based email client for Unix-like systems. It was originally written by Michael Elkins in 1995 and released under the GNU General Public License version 2 or any later version.

The MH Message Handling System is a free, open source e-mail client. It is different from almost all other mail reading systems in that, instead of a single program, it is made from several different programs which are designed to work from the command line provided by the shell on Unix-like operating systems. Another difference is that rather than storing multiple messages in a single file, messages each have their own separate file in a special directory. Taken together, these design choices mean that it is very easy and natural to script actions on mail messages using the normal shell scripting tools.

The following tables compare general and technical features of notable email client programs.

<span class="mw-page-title-main">Dovecot (software)</span>

Dovecot is an open-source IMAP and POP3 server for Unix-like operating systems, written primarily with security in mind. Timo Sirainen originated Dovecot and first released it in July 2002. Dovecot developers primarily aim to produce a lightweight, fast and easy-to-set-up open-source email server.

The UW IMAP server was the reference server implementation of the Internet Message Access Protocol. It was developed at the University of Washington by Mark Crispin and others.

The comparison of mail servers covers mail transfer agents (MTAs), mail delivery agents, and other computer software that provide e-mail services.

A mailbox is the destination to which electronic mail messages are delivered. It is the equivalent of a letter box in the postal system.

MIX is a high-performance, indexed, on-disk email storage system that is designed for use with the IMAP protocol. MIX was designed by Mark Crispin, the author of the IMAP protocol. Server support for it has been included in releases of UW IMAP since 2006, Panda IMAP, and Messaging Architects Netmail. MIX is also supported directly by the Alpine e-mail client.

<span class="mw-page-title-main">Email agent (infrastructure)</span>

An e-mail agent is a program that is part of the e-mail infrastructure, from composition by sender, to transfer across the network, to viewing by recipient. The best-known are message user agents and message transfer agents, but finer divisions exist.

References

  1. 1 2 Hall, E., ed. (September 2005). "Request for Comments: 4155 – The application/mbox Media Type". Internet Engineering Task Force . Archived from the original on 17 May 2021. Retrieved 17 May 2021.
  2. Resnick, P., ed. (April 2001). "Request for Comments: 2822 – Internet Message Format". Internet Engineering Task Force . Archived from the original on 31 March 2023. Retrieved 17 May 2021.
  3. Gellens, R., ed. (February 2004). "Request for Comments: 3676 – The Text/Plain Format and DelSp Parameters – Section 4.4: Space-Stuffing". Internet Engineering Task Force . Archived from the original on 16 May 2021. Retrieved 17 May 2021.
  4. "Configuring Netscape Mail On Unix: Why the Content-Length Format is Bad" Archived 2009-04-08 at the Wayback Machine by Jamie Zawinski 1997
  5. de Boyne Pollard, Jonathan (2004). ""mbox" is a family of several mutually incompatible mailbox formats". Frequently Given Answers. Archived from the original on 31 December 2020. Retrieved 20 March 2023.
  6. "Eudora 6.2.4 Mac User Guide" (PDF). p. 113. Archived from the original (PDF) on 2014-07-12. Retrieved 2015-10-29.
  7. "Importing and exporting your mail - MozillaZine Knowledge Base". kb.mozillazine.org. Archived from the original on 2013-07-03. Retrieved 2011-06-18.
  8. "Submitting patches: the essential guide to getting your code into the kernel — The Linux Kernel documentation". www.kernel.org. Archived from the original on 2019-10-27. Retrieved 2020-03-03.
  9. Randal, Allison; Sugalski, Dan; Tötsch, Leopold (2003). "Patch submission". Perl 6 Essentials . O'Reilly Media, Inc. p.  14. ISBN   978-0-596-00499-6.
  10. "Git - git-format-patch Documentation". git-scm.com. Archived from the original on 2020-03-07. Retrieved 2020-03-03.
  11. "Git - git-send-email Documentation". git-scm.com. Archived from the original on 2020-02-21. Retrieved 2020-03-03.

Further reading