Many email clients now offer some support for Unicode. Some clients will automatically choose between a legacy encoding and Unicode depending on the mail's content, either automatically [1] or when the user requests it. [2]
Technical requirements for sending of messages containing non-ASCII characters by email include
If the sender's or recipient's email address contains non-ASCII characters, sending of a message requires also encoding of these to a format that can be understood by mail servers.
To use Unicode in certain email header fields, e.g. subject lines, sender and recipient names, the Unicode text has to be encoded using a MIME "Encoded-Word" with a Unicode encoding as the charset. To use Unicode in the domain part of email addresses, IDNA encoding must traditionally be used. Alternatively, SMTPUTF8 [3] allows the use of UTF-8 encoding in email addresses (both in a local part and in domain name) as well as in a mail header section. Various standards had been created to retrofit the handling of non-ASCII data to the originally ASCII-only email protocol:
As with all encodings apart from US-ASCII, when using Unicode text in email, MIME must be used to specify that a Unicode transformation format is being used for the text.
UTF-7, an obsolete encoding, had an advantage over Unicode encodings, on obsolete non-8bit-clean networks, in that it does not require a transfer encoding to fit within the seven-bit limits of legacy Internet mail servers. On the other hand, UTF-16 must be transfer encoded to fit SMTP data format. Although not strictly required, UTF-8 is usually also transfer encoded to avoid problems across seven-bit mail servers. MIME transfer encoding of UTF-8 makes it either unreadable as a plain text (in the case of base64) or, for some languages and types of text, heavily size inefficient (in the case of quoted-printable).
Some document formats, such as HTML, PostScript and Rich Text Format have their own 7-bit encoding schemes for non-ASCII characters and can thus be sent without using any special email encodings. E.g. HTML email can use HTML entities to use characters from anywhere in Unicode even if the HTML source text for the email is in a legacy encoding (e.g. 7-bit ASCII). For details of this see Unicode and HTML.
Email is a method of transmitting and receiving messages using electronic devices. It was conceived in the late–20th century as the digital version of, or counterpart to, mail. Email is a ubiquitous and very widely used communication medium; in current use, an email address is often treated as a basic and necessary part of many processes in business, commerce, government, education, entertainment, and other spheres of daily life in most countries.
Multipurpose Internet Mail Extensions (MIME) is a standard that extends the format of email messages to support text in character sets other than ASCII, as well as attachments of audio, video, images, and application programs. Message bodies may consist of multiple parts, and header information may be specified in non-ASCII character sets. Email messages with MIME formatting are typically transmitted with standard protocols, such as the Simple Mail Transfer Protocol (SMTP), the Post Office Protocol (POP), and the Internet Message Access Protocol (IMAP).
The Simple Mail Transfer Protocol (SMTP) is an Internet standard communication protocol for electronic mail transmission. Mail servers and other message transfer agents use SMTP to send and receive mail messages. User-level email clients typically use SMTP only for sending messages to a mail server for relaying, and typically submit outgoing email to the mail server on port 587 or 465 per RFC 8314. For retrieving messages, IMAP is standard, but proprietary servers also often implement proprietary protocols, e.g., Exchange ActiveSync.
UTF-8 is a character encoding standard used for electronic communication. Defined by the Unicode Standard, the name is derived from Unicode Transformation Format – 8-bit. Almost every webpage is stored in UTF-8.
8-bit clean is an attribute of computer systems, communication channels, and other devices and software, that process 8-bit character encodings without treating any byte as an in-band control code.
In computer programming, Base64 is a group of binary-to-text encoding schemes that transforms binary data into a sequence of printable characters, limited to a set of 64 unique characters. More specifically, the source binary data is taken 6 bits at a time, then this group of 6 bits is mapped to one of 64 unique characters.
An email address identifies an email box to which messages are delivered. While early messaging systems used a variety of formats for addressing, today, email addresses follow a set of specific rules originally standardized by the Internet Engineering Task Force (IETF) in the 1980s, and updated by RFC 5322 and 6854. The term email address in this article refers to just the addr-spec in Section 3.4 of RFC 5322. The RFC defines address more broadly as either a mailbox or group. A mailbox value can be either a name-addr, which contains a display-name and addr-spec, or the more common addr-spec alone.
UTF-7 is an obsolete variable-length character encoding for representing Unicode text using a stream of ASCII characters. It was originally intended to provide a means of encoding Unicode text for use in Internet E-mail messages that was more efficient than the combination of UTF-8 with quoted-printable.
Punycode is a representation of Unicode with the limited ASCII character subset used for Internet hostnames. Using Punycode, host names containing Unicode characters are transcoded to a subset of ASCII consisting of letters, digits, and hyphens, which is called the letter–digit–hyphen (LDH) subset. For example, München is encoded as Mnchen-3ya.
An internationalized domain name (IDN) is an Internet domain name that contains at least one label displayed in software applications, in whole or in part, in non-Latin script or alphabet or in the Latin alphabet-based characters with diacritics or ligatures. These writing systems are encoded by computers in multibyte Unicode. Internationalized domain names are stored in the Domain Name System (DNS) as ASCII strings using Punycode transcription.
Quoted-Printable, or QP encoding, is a binary-to-text encoding system using printable ASCII characters to transmit 8-bit data over a 7-bit data path or, generally, over a medium which is not 8-bit clean. Historically, because of the wide range of systems and protocols that could be used to transfer messages, e-mail was often assumed to be non-8-bit-clean – however, modern SMTP servers are in most cases 8-bit clean and support 8BITMIME
extension. It can also be used with data that contains non-permitted octets or line lengths exceeding SMTP limits. It is defined as a MIME content transfer encoding for use in e-mail.
Sender Policy Framework (SPF) is an email authentication method which ensures the sending mail server is authorized to originate mail from the email sender's domain. This authentication only applies to the email sender listed in the "envelope from" field during the initial SMTP connection. If the email is bounced, a message is sent to this address, and for downstream transmission it typically appears in the "Return-Path" header. To authenticate the email address which is actually visible to recipients on the "From:" line, other technologies such as DMARC must be used. Forgery of this address is known as email spoofing, and is often used in phishing and email spam.
A bounce message or just "bounce" is an automated message from an email system, informing the sender of a previous message that the message has not been delivered. The original message is said to have "bounced".
The following tables compare general and technical features of notable email client programs.
Email authentication, or validation, is a collection of techniques aimed at providing verifiable information about the origin of email messages by validating the domain ownership of any message transfer agents (MTA) who participated in transferring and possibly modifying a message.
This article compares Unicode encodings in two types of environments: 8-bit clean environments, and environments that forbid the use of byte values with the high bit set. Originally, such prohibitions allowed for links that used only seven data bits, but they remain in some standards and so some standard-conforming software must generate messages that comply with the restrictions. The Standard Compression Scheme for Unicode and the Binary Ordered Compression for Unicode are excluded from the comparison tables because it is difficult to simply quantify their size.
Sieve is a programming language that can be used for email filtering. It owes its creation to the CMU Cyrus Project, creators of Cyrus IMAP server.
DomainKeys Identified Mail (DKIM) is an email authentication method designed to detect forged sender addresses in email, a technique often used in phishing and email spam.
Email forwarding generically refers to the operation of re-sending a previously delivered email to an email address to one or more different email addresses.
International email arises from the combined provision of internationalized domain names (IDN) and email address internationalization (EAI). The result is email that contains international characters, encoded as UTF-8, in the email header and in supporting mail transfer protocols. The most significant aspect of this is the allowance of email addresses in most of the world's writing systems, at both interface and transport levels.