This article needs additional citations for verification .(June 2008) |
In computer networking, a system is 8-bit clean if it processes 8-bit character encodings without altering the high bit or treating any byte as an in-band control code. This property can describe both a communications protocol and the software and devices that implement such protocols. Although many early email systems only supported 7-bit data, the vast majority of modern email systems are 8-bit clean.
Until the early 1990s, many programs and data transmission channels were character-oriented and treated some characters like end-of-text (ETX) as control characters. Others assumed a stream of seven-bit characters, with values between 0 and 127; for example, the ASCII standard used only seven bits per character, avoiding an eight-bit representation in order to save on data transmission costs. On computers and data links using 8-bit bytes, this left the top bit of each byte free for use as a parity bit, flag bit, or metadata control bit. Seven-bit systems and data links are unable to directly handle more complex character codes which are commonplace in non-English-speaking countries with larger alphabets.
Binary files consisting of 8-bit octets cannot be transmitted through 7-bit data channels directly. To work around this, binary-to-text encodings have been devised which use only 7-bit ASCII characters. Some of these encodings are uuencoding, Ascii85, SREC, BinHex, kermit and MIME's Base64. EBCDIC-based systems cannot handle all characters used in UUencoded data.[ clarification needed (see talk)] However, the base64 encoding does not have this problem.
Historically, various media were used to transfer messages, some of which only supported 7-bit data, so an 8-bit message had high chances to be garbled during transmission in the 20th century. Some implementations ignored the formal discouraging of 8-bit data and allowed bytes with the high bit set to pass through. Such implementations are said to be 8-bit clean. In general, a communications protocol is said to be 8-bit clean if it correctly passes through the high bit of each byte in the communication process.
Many early communications protocol standards, such as RFC 780 , 788 , 821 , 2821 , 5321 (for SMTP), RFC 977 (for NNTP) and RFC 1056, were designed to work over such "7-bit" communication links. They specifically require the use of ASCII "transmitted as an 8-bit byte with the high-order bit cleared to zero", and some of these [1] explicitly restrict all data to 7-bit characters.
For the first few decades of email networks (1971 to the early 1990s), most email messages were plain text in the 7-bit US-ASCII character set. [2]
The RFC 788 definition of SMTP, like its predecessor RFC 780, limits Internet Mail to lines (1000 characters or less) of 7-bit US-ASCII characters. [3] [4] [5] [6]
Later, the format of email messages was redefined in order to support messages that are not entirely US-ASCII text (text messages in character sets other than US-ASCII, and non-text messages, such as audio and images). [6] The header field Content-Transfer-Encoding=binary [a] requires an 8-bit clean transport.
RFC 3977 [7] specifies that "NNTP operates over any reliable bi-directional 8-bit-wide data stream channel" and changes the character set for commands to UTF-8. However, RFC 5536 [8] still limits the character set to ASCII, including RFC 2047 [9] and RFC 2231 [10] MIME encoding of non-ASCII data.
The Internet community generally adds features by extension, allowing communication in both directions between upgraded machines and not-yet-upgraded machines, rather than declaring formerly standards-compliant legacy software to be broken and requiring that all software worldwide be upgraded to the latest standard. The recommended way to take advantage of 8-bit clean links between machines is to use the ESMTP ( RFC 1869) 8BITMIME extension [11] [12] for message bodies and the SMTP SMTPUTF8 [13] extension for message headers. Despite this, some mail transfer agents, notably Exim and qmail, relay mail to servers that do not advertise 8BITMIME without performing the conversion to 7-bit MIME (typically quoted-printable, "Q-P conversion") required by RFC 6152. This "just-send-8" attitude does not, in fact, cause problems in practice because virtually all modern email servers are 8-bit clean. [14]
The maximum total length of a text line including the <CRLF> is 1000 characters (but not counting the leading dot duplicated for transparency).
SMTP as defined in RFC 821 limits the sending of Internet Mail to US-ASCII characters.
Multipurpose Internet Mail Extensions, or MIME, redefines the format of messages