This article needs additional citations for verification .(June 2008) |
8-bit clean is an attribute of computer systems, communication channels, and other devices and software, that process 8-bit character encodings without treating any byte as an in-band control code.
Until the early 1990s, many programs and data transmission channels were character-oriented and treated some characters, e.g., ETX, as control characters. Others assumed a stream of seven-bit characters, with values between 0 and 127; for example, the ASCII standard used only seven bits per character, avoiding an 8-bit representation in order to save on data transmission costs. On computers and data links using 8-bit bytes, this left the top bit of each byte free for use as a parity, flag bit, or metadata control bit. 7-bit systems and data links are unable to directly handle more complex character codes which are commonplace in non-English-speaking countries with larger alphabets.
Binary files of octets cannot be transmitted through 7-bit data channels directly. To work around this, binary-to-text encodings have been devised which use only 7-bit ASCII characters. Some of these encodings are uuencoding, Ascii85, SREC, BinHex, kermit and MIME's Base64. EBCDIC-based systems cannot handle all characters used in UUencoded data.[ clarification needed (see talk)] However, the base64 encoding does not have this problem.
Historically, various media were used to transfer messages, some of them only supporting 7-bit data, so an 8-bit message had high chances to be garbled during transmission in the 20th century. But some implementations really did not care about formal discouraging of 8-bit data and allowed high bit set bytes to pass through. Such implementations are said to be 8-bit clean. In general, a communications protocol is said to be 8-bit clean if it correctly passes through the high bit of each byte in the communication process.
Many early communications protocol standards, such as RFC 780 , 788 , 821 , 2821 , 5321 (for SMTP), RFC 977 (for NNTP) and RFC 1056, were designed to work over such "7-bit" communication links. They specifically require the use of ASCII character set "transmitted as an 8-bit byte with the high-order bit cleared to zero" and some of these [1] explicitly restrict all data to 7-bit characters.
For the first few decades of email networks (1971 to the early 1990s), most email messages were plain text in the 7-bit US-ASCII character set. [2]
The RFC 788 definition of SMTP, like its predecessor RFC 780, limits Internet Mail to lines (1000 characters or less) of 7-bit US-ASCII characters. [3] [4] [5] [6]
Later, the format of email messages was redefined in order to support messages that are not entirely US-ASCII text (text messages in character sets other than US-ASCII, and non-text messages, such as audio and images). [6] The header field Content-Transfer-Encoding=binary [a] requires an 8-bit clean transport.
RFC 3977 [7] specifies that "NNTP operates over any reliable bi-directional 8-bit-wide data stream channel", and changes the character set for commands to UTF-8. However, RFC 5536 [8] still limits the character set to ASCII, including RFC 2047 [9] and RFC 2231 [10] MIME encoding of non-ASCII data.
The Internet community generally adds features by extension, allowing communication in both directions between upgraded machines and not-yet-upgraded machines, rather than declaring formerly standards-compliant legacy software to be "broken" and insisting that all software worldwide be upgraded to the latest standard. The recommended way to take advantage of 8-bit-clean links between machines is to use the ESMTP ( RFC 1869) 8BITMIME extension [11] [12] for message bodies and the SMTP SMTPUTF8 [13] extension for message headers. Despite this, some mail transfer agents, notably Exim and qmail, relay mail to servers that do not advertise 8BITMIME without performing the conversion to 7-bit MIME (typically quoted-printable, "Q-P conversion") required by RFC 6152. This "just-send-8" attitude does not, in fact, cause problems in practice because virtually all modern email servers are 8-bit clean. [14]
The maximum total length of a text line including the <CRLF> is 1000 characters (but not counting the leading dot duplicated for transparency).
SMTP as defined in RFC 821 limits the sending of Internet Mail to US-ASCII characters.
Multipurpose Internet Mail Extensions, or MIME, redefines the format of messages