MIX (email)

Last updated

MIX is a high-performance, indexed, on-disk email storage system that is designed for use with the IMAP protocol. MIX was designed by Mark Crispin, the author of the IMAP protocol. Server support for it has been included in releases of UW IMAP since 2006, [1] Panda IMAP, and Messaging Architects Netmail. MIX is also supported directly by the Alpine e-mail client.

Contents

Design

MIX mailboxes are directories containing several types of files, including a metadata file, an index file, a dynamic status data file, a threading/sorting cache file, and a collection of files containing message content. [2] MIX mailboxes can also contain subordinate mailboxes, which are implemented as sub directories within the MIX directory.

The MIX format was designed with an emphasis on very high scalability, reliability, and performance, while efficiently supporting modern features of the IMAP protocol. [2] MIX has been used successfully with mailboxes of 750,000 messages.

The base level MIX format has four files: a metadata file, an index file, a status file, and some set of message data files. The metadata file contains base-level data applicable to the entire mailbox; i.e., the UID validity, last assigned UID, and list of keywords. The index file contains pointers to each unexpunged message in the message data files, along with flags, size, and IMAP internaldate data. The status file contains per-message flags and keywords.

All these files may be hidden files in a directory (with the directory name being the name of the mailbox). Thus a directory with gigabytes of mail in it may appear to be empty if examined with tools that don't show hidden files. This is a common source of confusion for system administrators encountering MIX for the first time.[ citation needed ]

By design, it is possible to recover the mailbox into a usable state if any of these files is lost or corrupted. For example, it is possible to rebuild the index file by reading each of the data files, with no consequence other than the possible "unexpunging" of an expunged message that had not yet had its space recovered.

Another important part of the MIX design is that no file is modified unless the data specific to that file is altered; thus a flag change alters the status file but not the metadata or index files. This reduces the impact of any system event that corrupts a file write in progress.

Each file also has a "modification sequence" which is incremented each time the file is changed. When a MIX implementation updates from a file, if the modification sequence is unchanged, it closes the file at once without reading it further. In addition, each status file entry also has a modification sequence, which permits lossless synchronization of multiple consumer message flag/keyword updates/

Extensions

MIX allows for implementation-specific extensions. All MIX implementations must be interchangeable at the base level, but are not required to implement extensions and must tolerate the absence of extensions.

The UW IMAP and Panda IMAP implementations of MIX have a sort cache file that contains data used by the IMAP SORT and THREAD operators. This permits these operators to load most (if not all) of the data they need without having to parse it from message data.

The Messaging Architects implementation of MIX has extended mailbox metadata (currently used to hold the mailbox's display name), message metadata (used for multiple purposes including a JSON representation of the message structure), and a global modification sequence (thus permitting a fast check for mailbox update without having to check the modification sequence in multiple files). Messaging Architects' implementation also has a "virtual mailbox" or stubbing capability, in which a message in a mailbox is actually a pointer to a message in another mailbox.

Comparisons with other mail storage formats

MIX can be considered a hybrid between the maildir (single message per file) and mbox (single file per mailbox) types of email storage formats. [3]

Versus maildir

MIX has a similarity to maildir, in that MIX mailboxes are directories rather than single files.

Unlike maildir, however, MIX supports an index file for fast opens and mailbox scanning. Where maildir stores each message in its own file on disk, MIX can aggregate messages into message files, according to the configured size limit for a message file. Messages larger than the size limit are not aggregated. A MIX directory will tend to have a smaller number of files than a corresponding maildir mailbox as a result, which can be advantageous on certain operating systems. [4] MIX has support for efficient retrieval and modification of metadata and status information.

MIX also aggregates multiple smaller messages into single data files of up to 1MB in size (larger messages get a data file to themselves). This reduces the number of nodes required in the directory, which is important for performance and scalability. The MIX mailbox format requires more rigorous locking support from the operating system than maildir, and was explicitly not designed to support being written to over NFS.

Maildir, on the other hand, was designed to work in an NFS environment. [5] Maildir enjoys wider client, server, and tool support than MIX.

Versus mbox

MIX enjoys considerable optimization versus the common mbox mail format. MIX has a binary index to accelerate scanning and retrieval of messages, whereas mbox requires full linear scans to extract messages. Like maildir, and unlike mbox, MIX supports mail boxes that contain both messages and subordinate mailboxes. MIX supports multiple clients concurrently reading and writing to individual mailboxes, which can not be achieved with mbox.

On the other hand, the mbox format is far more widely supported than MIX. mbox is a ubiquitous mailbox file format, and is often used as a greatest common factor exchange format.

See also

Related Research Articles

In computing, the Internet Message Access Protocol (IMAP) is an Internet standard protocol used by email clients to retrieve email messages from a mail server over a TCP/IP connection. IMAP is defined by RFC 9051.

In computing, the Post Office Protocol (POP) is an application-layer Internet standard protocol used by e-mail clients to retrieve e-mail from a mail server. Today, POP version 3 (POP3) is the most commonly used version. Together with IMAP, it is one of the most common protocols for email retrieval.

The Simple Mail Transfer Protocol (SMTP) is an Internet standard communication protocol for electronic mail transmission. Mail servers and other message transfer agents use SMTP to send and receive mail messages. User-level email clients typically use SMTP only for sending messages to a mail server for relaying, and typically submit outgoing email to the mail server on port 587 or 465 per RFC 8314. For retrieving messages, IMAP is standard, but proprietary servers also often implement proprietary protocols, e.g., Exchange ActiveSync.

<span class="mw-page-title-main">Email client</span> Computer program used to access and manage a users email

An email client, email reader or, more formally, message user agent (MUA) or mail user agent is a computer program used to access and manage a user's email.

<span class="mw-page-title-main">Maildir</span> E-mail format

The Maildir e-mail format is a common way of storing email messages on a file system, rather than in a database. Each message is assigned a file with a unique name, and each mail folder is a file system directory containing these files. Maildir was designed by Daniel J. Bernstein circa 1995, with a major goal of eliminating the need for program code to handle file locking and unlocking through use of the local filesystem. Maildir design reflects the fact that the only operations valid for an email message is that it be created, deleted or have its status changed in some way.

Mbox is a generic term for a family of related file formats used for holding collections of email messages. It was first implemented in Fifth Edition Unix.

In computing, tar is a computer software utility for collecting many files into one archive file, often referred to as a tarball, for distribution or backup purposes. The name is derived from "tape archive", as it was originally developed to write data to sequential I/O devices with no file system of their own, such as devices that use magnetic tape. The archive data sets created by tar contain various file system parameters, such as name, timestamps, ownership, file-access permissions, and directory organization. POSIX abandoned tar in favor of pax, yet tar sees continued widespread use.

qmail is a mail transfer agent (MTA) that runs on Unix. It was written, starting December 1995, by Daniel J. Bernstein as a more secure alternative to the popular Sendmail program. Originally license-free software, qmail's source code was later dedicated to the public domain by the author.

procmail is an email server software component — specifically, a message delivery agent (MDA). It was one of the earliest mail filter programs. It is typically used in Unix-like mail systems, using the mbox and Maildir storage formats.

<span class="mw-page-title-main">Mutt (email client)</span> Text-based email client for Unix-like systems

Mutt is a text-based email client for Unix-like systems. It was originally written by Michael Elkins in 1995 and released under the GNU General Public License version 2 or any later version.

Mark Reed Crispin is best known as the father of the Internet Message Access Protocol (IMAP), having invented it in 1985 during his time at the Stanford Knowledge Systems Laboratory. He is the author or co-author of numerous RFCs and was the principal author of UW IMAP, one of the reference implementations of the IMAP4rev1 protocol described in RFC 3501. He also designed the MIX mail storage format.

The MH Message Handling System is a free, open source e-mail client. It is different from almost all other mail reading systems in that, instead of a single program, it is made from several different programs which are designed to work from the command line provided by the shell on Unix-like operating systems. Another difference is that rather than storing multiple messages in a single file, messages each have their own separate file in a special directory. Taken together, these design choices mean that it is very easy and natural to script actions on mail messages using the normal shell scripting tools.

The Cyrus IMAP server is electronic mail server software developed by Carnegie Mellon University. It differs from other Internet Message Access Protocol (IMAP) server implementations in that it is generally intended to be run on sealed servers, where normal users cannot log in.

<span class="mw-page-title-main">Dovecot (software)</span>

Dovecot is an open-source IMAP and POP3 server for Unix-like operating systems, written primarily with security in mind. Timo Sirainen originated Dovecot and first released it in July 2002. Dovecot developers primarily aim to produce a lightweight, fast and easy-to-set-up open-source email server.

The UW IMAP server was the reference server implementation of the Internet Message Access Protocol. It was developed at the University of Washington by Mark Crispin and others.

<span class="mw-page-title-main">Alpine (email client)</span> Email client

Alpine is a free software email client developed at the University of Washington.

The Courier Mail Server is a mail transfer agent (MTA) server that provides SMTP, IMAP, POP3, SMAP, webmail, and mailing list services with individual components. It is best known for its IMAP server component.

A mailbox is the destination to which electronic mail messages are delivered. It is the equivalent of a letter box in the postal system.

<span class="mw-page-title-main">Recoll</span> Desktop search tool

Recoll is a desktop search tool that provides full-text search in a GUI with a few mandatory external dependencies. It runs on many Unix-like operating systems and is mostly independent of the desktop environment. Recoll has been ported to OS/2, and is planned for integration into the OS/2-based ArcaOS.

<span class="mw-page-title-main">OfflineIMAP</span>

OfflineIMAP is IMAP synchronization utility software, capable of synchronizing mail on IMAP server with local Maildir folder or another server.

References

  1. "Announcing: UW IMAP toolkit 2006 (imap-2006)". Archived from the original on 2012-09-29. Retrieved 2009-04-04.
  2. 1 2 "Re: Benchmarking imap, filesystems". Archived from the original on 2012-09-29. Retrieved 2009-04-04.
  3. "Re: mix format". Archived from the original on 2011-07-26. Retrieved 2009-04-04.
  4. "answered flag updates close other alpine sessions". Archived from the original on 2011-07-20. Retrieved 2009-04-04.
  5. "Using maildir format". Archived from the original on 2000-09-02. Retrieved 2009-05-22.