Multipurpose Internet Mail Extensions (MIME) is a standard that extends the format of email messages to support text in character sets other than ASCII, as well as attachments of audio, video, images, and application programs. Message bodies may consist of multiple parts, and header information may be specified in non-ASCII character sets. Email messages with MIME formatting are typically transmitted with standard protocols, such as the Simple Mail Transfer Protocol (SMTP), the Post Office Protocol (POP), and the Internet Message Access Protocol (IMAP).
MIME is an Internet standard. It is specified in a series of requests for comments: RFC 2045 , RFC 2046 , RFC 2047 , RFC 4288 , RFC 4289 and RFC 2049 . The integration with SMTP email is specified in RFC 1521 and RFC 1522 .
Although the MIME formalism was designed mainly for SMTP, its content types are also important in other communication protocols. In the HyperText Transfer Protocol (HTTP) for the World Wide Web, servers insert a MIME header field at the beginning of any Web transmission. Clients use the content type or media type header to select an appropriate viewer application for the type of data indicated.
MIME originated from the Andrew Messaging System, which was part of Andrew Project developed at Carnegie Mellon University (CMU), as a cross-platform alternative to the Andrew-specific data format. [1]
The presence of this header field indicates the message is MIME-formatted. The value is typically "1.0". The field appears as follows:
MIME-Version: 1.0
According to MIME co-creator Nathaniel Borenstein, the version number was introduced to permit changes to the MIME protocol in subsequent versions. However, Borenstein admitted short-comings in the specification that hindered the implementation of this feature: "We did not adequately specify how to handle a future MIME version. ... So if you write something that knows 1.0, what should you do if you encounter 2.0 or 1.1? I sort of thought it was obvious but it turned out everyone implemented that in different ways. And the result is that it would be just about impossible for the Internet to ever define a 2.0 or a 1.1." [2]
This header field indicates the media type of the message content, consisting of a type and subtype, for example
Content-Type: text/plain
Through the use of the multipart type, MIME allows mail messages to have parts arranged in a tree structure where the leaf nodes are any non-multipart content type and the non-leaf nodes are any of a variety of multipart types. This mechanism supports (non-exhaustively):
The original MIME specifications only described the structure of mail messages. They did not address the issue of presentation styles. The content-disposition header field was added in RFC 2183 to specify the presentation style. A MIME part can have:
In addition to the presentation style, the field Content-Disposition also provides parameters for specifying the name of the file, the creation date and modification date, which can be used by the reader's mail user agent to store the attachment.
The following example is taken from RFC 2183, where the header field is defined:
Content-Disposition: attachment; filename=genome.jpeg; modification-date="Wed, 12 Feb 1997 16:29:51 -0500";
The filename may be encoded as defined in RFC 2231.
As of 2010, a majority of mail user agents did not follow this prescription fully. The widely used Mozilla Thunderbird mail client ignores the content-disposition fields in the messages and uses independent algorithms for selecting the MIME parts to display automatically. Thunderbird prior to version 3 also sends out newly composed messages with inline content disposition for all MIME parts. Most users are unaware of how to set the content disposition to attachment. [3] Many mail user agents also send messages with the file name in the name parameter of the content-type header instead of the filename parameter of the header field Content-Disposition. This practice is discouraged, as the file name should be specified either with the parameter filename, or with both the parameters filename and name. [4]
In HTTP, the response header field Content-Disposition: attachment is usually used as a hint to the client to present the response body as a downloadable file. Typically, when receiving such a response, a Web browser prompts the user to save its content as a file, instead of displaying it as a page in a browser window, with filename suggesting the default file name.
In June 1992, MIME (RFC 1341, since made obsolete by RFC 2045) defined a set of methods for representing binary data in formats other than ASCII text format. The content-transfer-encoding: MIME header field has 2-sided significance:
The RFC and the IANA's list of transfer encodings define the values shown below, which are not case sensitive. '7bit', '8bit', and 'binary' mean that no binary-to-text encoding on top of the original encoding was used. In these cases, the header field is actually redundant for the email client to decode the message body, but it may still be useful as an indicator of what type of object is being sent. Values 'quoted-printable' and 'base64' tell the email client that a binary-to-text encoding scheme was used and that appropriate initial decoding is necessary before the message can be read with its original encoding (e.g. UTF-8).
There is no encoding defined which is explicitly designed for sending arbitrary binary data through SMTP transports with the 8BITMIME extension. Thus, if BINARYMIME isn't supported, base64 or quoted-printable (with their associated inefficiency) are sometimes still useful. This restriction does not apply to other uses of MIME such as Web Services with MIME attachments or MTOM.
Since RFC 2822, conforming message header field names and values use ASCII characters; values that contain non-ASCII data should use the MIME encoded-word syntax (RFC 2047) instead of a literal string. This syntax uses a string of ASCII characters indicating both the original character encoding (the "charset") and the content-transfer-encoding used to map the bytes of the charset into ASCII characters.
The form is: "=?
charset?
encoding?
encoded text?=
".
Q
" denoting Q-encoding that is similar to the quoted-printable encoding, or "B
" denoting base64 encoding.The ASCII codes for the question mark ("?") and equals sign ("=") may not be represented directly as they are used to delimit the encoded word. The ASCII code for space may not be represented directly because it could cause older parsers to split up the encoded word undesirably. To make the encoding smaller and easier to read the underscore is used to represent the ASCII code for space creating the side effect that underscore cannot be represented directly. The use of encoded words in certain parts of header fields imposes further restrictions on which characters may be represented directly.
For example,
Subject: =?iso-8859-1?Q?=A1Hola,_se=F1or!?=
is interpreted as "Subject: ¡Hola, señor!".
The encoded-word format is not used for the names of the headers fields (for example Subject). These names are usually English terms and always in ASCII in the raw message. When viewing a message with a non-English email client, the header field names might be translated by the client.
The MIME multipart message contains a boundary in the header field Content-Type:
; this boundary, which must not occur in any of the parts, is placed between the parts, and at the beginning and end of the body of the message, as follows:
MIME-Version: 1.0Content-Type:multipart/mixed;boundary=frontier This is a message with multiple parts in MIME format. --frontierContent-Type:text/plain This is the body of the message. --frontierContent-Type:application/octet-streamContent-Transfer-Encoding:base64PGh0bWw+CiAgPGhlYWQ+CiAgPC9oZWFkPgogIDxib2R5PgogICAgPHA+VGhpcyBpcyB0aGUgYm9keSBvZiB0aGUgbWVzc2FnZS48L3A+CiAgPC9ib2R5Pgo8L2h0bWw+Cg==--frontier--
Each part consists of its own content header (zero or more Content-
header fields) and a body. Multipart content can be nested. The Content-Transfer-Encoding
of a multipart type must always be "7bit", "8bit" or "binary" to avoid the complications that would be posed by multiple levels of decoding. The multipart block as a whole does not have a charset; non-ASCII characters in the part headers are handled by the Encoded-Word system, and the part bodies can have charsets specified if appropriate for their content-type.
Notes:
The MIME standard defines various multipart-message subtypes, which specify the nature of the message parts and their relationship to one another. The subtype is specified in the Content-Type
header field of the overall message. For example, a multipart MIME message using the digest subtype would have its Content-Type
set as "multipart/digest".
The RFC initially defined four subtypes: mixed, digest, alternative and parallel. A minimally compliant application must support mixed and digest; other subtypes are optional. Applications must treat unrecognized subtypes as "multipart/mixed". Additional subtypes, such as signed and form-data, have since been separately defined in other RFCs.
multipart/mixed is used for sending files with different Content-Type
header fields inline (or as attachments). If sending pictures or other easily readable files, most mail clients will display them inline (unless explicitly specified with Content-Disposition: attachment in which case offered as attachments). The default content-type for each part is "text/plain".
The type is defined in RFC 2046. [5]
multipart/digest is a simple way to send multiple text messages. The default content-type for each part is "message/rfc822".
The MIME type is defined in RFC 2046. [6]
The multipart/alternative subtype indicates that each part is an "alternative" version of the same (or similar) content, each in a different format denoted by its "Content-Type" header. The order of the parts is significant. RFC1341 states: In general, user agents that compose multipart/alternative entities should place the body parts in increasing order of preference, that is, with the preferred format last. [7]
Systems can then choose the "best" representation they are capable of processing; in general, this will be the last part that the system can understand, although other factors may affect this.
Since a client is unlikely to want to send a version that is less faithful than the plain text version, this structure places the plain text version (if present) first. This makes life easier for users of clients that do not understand multipart messages.
Most commonly, multipart/alternative is used for email with two parts, one plain text (text/plain) and one HTML (text/html). The plain text part provides backwards compatibility while the HTML part allows use of formatting and hyperlinks. Most email clients offer a user option to prefer plain text over HTML; this is an example of how local factors may affect how an application chooses which "best" part of the message to display.
While it is intended that each part of the message represent the same content, the standard does not require this to be enforced in any way. At one time, anti-spam filters would only examine the text/plain part of a message, [8] because it is easier to parse than the text/html part. But spammers eventually took advantage of this, creating messages with an innocuous-looking text/plain part and advertising in the text/html part. Anti-spam software eventually caught up on this trick, penalizing messages with very different text in a multipart/alternative message. [8]
The type is defined in RFC 2046. [9]
A multipart/related is used to indicate that each message part is a component of an aggregate whole. It is for compound objects consisting of several inter-related components – proper display cannot be achieved by individually displaying the constituent parts. The message consists of a root part (by default, the first) which reference other parts inline, which may in turn reference other parts. Message parts are commonly referenced by Content-ID. The syntax of a reference is unspecified and is instead dictated by the encoding or protocol used in the part.
One common usage of this subtype is to send a web page complete with images in a single message. The root part would contain the HTML document, and use image tags to reference images stored in the latter parts.
The type is defined in RFC 2387.
multipart/report is a message type that contains data formatted for a mail server to read. It is split between a text/plain (or some other content/type easily readable) and a message/delivery-status, which contains the data formatted for the mail server to read.
The type is defined in RFC 6522.
A multipart/signed message is used to attach a digital signature to a message. It has exactly two body parts, a body part and a signature part. The whole of the body part, including mime fields, is used to create the signature part. Many signature types are possible, like "application/pgp-signature" (RFC 3156) and "application/pkcs7-signature" (S/MIME).
The type is defined in RFC 1847. [10]
A multipart/encrypted message has two parts. The first part has control information that is needed to decrypt the application/octet-stream second part. Similar to signed messages, there are different implementations which are identified by their separate content types for the control part. The most common types are "application/pgp-encrypted" (RFC 3156) and "application/pkcs7-mime" (S/MIME).
The MIME type defined in RFC 1847. [11]
The MIME type multipart/form-data is used to express values submitted through a form. Originally defined as part of HTML 4.0, it is most commonly used for submitting files with HTTP. It is specified in RFC 7578, superseding RFC 2388. example
The content type multipart/x-mixed-replace was developed as part of a technology to emulate server push and streaming over HTTP.
All parts of a mixed-replace message have the same semantic meaning. However, each part invalidates – "replaces" – the previous parts as soon as it is received completely. Clients should process the individual parts as soon as they arrive and should not wait for the whole message to finish.
Originally developed by Netscape, [12] it is still supported by Mozilla, Firefox, Safari, and Opera. It is commonly used in IP cameras as the MIME type for MJPEG streams. [13] It was supported by Chrome for main resources until 2013 (images can still be displayed using this content type). [14]
The multipart/byterange is used to represent noncontiguous byte ranges of a single message, it is used by HTTP when a server returns multiple byte ranges and is defined in RFC 2616.
Email is a method of transmitting and receiving messages using electronic devices. It was conceived in the late–20th century as the digital version of, or counterpart to, mail. Email is a ubiquitous and very widely used communication medium; in current use, an email address is often treated as a basic and necessary part of many processes in business, commerce, government, education, entertainment, and other spheres of daily life in most countries.
In computing, the Internet Message Access Protocol (IMAP) is an Internet standard protocol used by email clients to retrieve email messages from a mail server over a TCP/IP connection. IMAP is defined by RFC 9051.
The Simple Mail Transfer Protocol (SMTP) is an Internet standard communication protocol for electronic mail transmission. Mail servers and other message transfer agents use SMTP to send and receive mail messages. User-level email clients typically use SMTP only for sending messages to a mail server for relaying, and typically submit outgoing email to the mail server on port 587 or 465 per RFC 8314. For retrieving messages, IMAP is standard, but proprietary servers also often implement proprietary protocols, e.g., Exchange ActiveSync.
An email client, email reader or, more formally, message user agent (MUA) or mail user agent is a computer program used to access and manage a user's email.
8-bit clean is an attribute of computer systems, communication channels, and other devices and software, that process 8-bit character encodings without treating any byte as an in-band control code.
In computer programming, Base64 is a group of binary-to-text encoding schemes that transforms binary data into a sequence of printable characters, limited to a set of 64 unique characters. More specifically, the source binary data is taken 6 bits at a time, then this group of 6 bits is mapped to one of 64 unique characters.
UTF-7 is an obsolete variable-length character encoding for representing Unicode text using a stream of ASCII characters. It was originally intended to provide a means of encoding Unicode text for use in Internet E-mail messages that was more efficient than the combination of UTF-8 with quoted-printable.
Quoted-Printable, or QP encoding, is a binary-to-text encoding system using printable ASCII characters to transmit 8-bit data over a 7-bit data path or, generally, over a medium which is not 8-bit clean. Historically, because of the wide range of systems and protocols that could be used to transfer messages, e-mail was often assumed to be non-8-bit-clean – however, modern SMTP servers are in most cases 8-bit clean and support 8BITMIME
extension. It can also be used with data that contains non-permitted octets or line lengths exceeding SMTP limits. It is defined as a MIME content transfer encoding for use in e-mail.
An email attachment is a computer file sent along with an email message. One or more files can be attached to any email message, and be sent along with it to the recipient. This is typically used as a simple method to share documents and images.
MHTML, an initialism of "MIME encapsulation of aggregate HTML documents", is a Web archive file format used to combine, in a single computer file, the HTML code and its companion resources that are represented by external hyperlinks in the web page's HTML code. The content of an MHTML file is encoded using the same techniques that were first developed for HTML email messages, using the MIME content type multipart/related
. MHTML files use an .mhtml or .mht filename extension.
In email, a return receipt is an acknowledgment by the recipient's email client to the sender of receipt of an email message. What acknowledgment, if any, is sent by the recipient to the sender is dependent on the email software of the recipient.
Many email clients now offer some support for Unicode. Some clients will automatically choose between a legacy encoding and Unicode depending on the mail's content, either automatically or when the user requests it.
HTML email is the use of a subset of HTML to provide formatting and semantic markup capabilities in email that are not available with plain text: Text can be linked without displaying a URL, or breaking long URLs into multiple pieces. Text is wrapped to fit the width of the viewing window, rather than uniformly breaking each line at 78 characters. It allows in-line inclusion of images, tables, as well as diagrams or mathematical formulae as images, which are otherwise difficult to convey.
Keith Moore is the author and co-author of several IETF RFCs related to the MIME and SMTP protocols for electronic mail, among others:
In information and communications technology, a media type, content type or MIME type is a two-part identifier for file formats and content formats. Their purpose is comparable to filename extensions and uniform type identifiers, in that they identify the intended data format. They are mainly used by technologies underpinning the Internet, and also used on Linux desktop systems.
Jakarta Mail is a Jakarta EE API used to send and receive email via SMTP, POP3 and IMAP. Jakarta Mail is built into the Jakarta EE platform, but also provides an optional package for use in Java SE.
Ned Freed was an IETF participant and Request for Comments author who contributed to a significant number of Internet Protocol standards, mostly related to email. He is best known as the co-inventor of email MIME attachments, with Nathaniel Borenstein.
International email arises from the combined provision of internationalized domain names (IDN) and email address internationalization (EAI). The result is email that contains international characters, encoded as UTF-8, in the email header and in supporting mail transfer protocols. The most significant aspect of this is the allowance of email addresses in most of the world's writing systems, at both interface and transport levels.
The Abuse Reporting Format (ARF) also known as the Messaging Abuse Reporting Format (MARF) is a standard format for reporting spam via email.
Efail, also written EFAIL, is a security hole in email systems with which content can be transmitted in encrypted form. This gap allows attackers to access the decrypted content of an email if it contains active content like HTML or JavaScript, or if loading of external content has been enabled in the client. Affected email clients include Gmail, Apple Mail, and Microsoft Outlook.