YEnc

Last updated

yEnc is a binary-to-text encoding scheme for transferring binary files in messages on Usenet or via e-mail. It reduces the overhead over previous US-ASCII-based encoding methods by using an 8-bit encoding method. yEnc's overhead is often (if each byte value appears approximately with the same frequency on average) as little as 1–2%, [1] compared to 33–40% overhead for 6-bit encoding methods like uuencode and Base64. yEnc was initially developed by Jürgen Helbing, and its first release was early 2001. By 2003 yEnc became the de facto standard encoding system for binary files on Usenet. [2] The name yEncode is a wordplay on "Why encode?", since the idea is to only encode characters if it is absolutely required to adhere to the message format standard. [3]

Contents

How yEnc works

Usenet and email message bodies were intended to contain only ASCII characters ( RFC   822 or RFC   2822). Most competing encodings represent binary files by converting them into printable ASCII characters, because the range of printable ASCII characters is supported by most operating systems. However, since this reduces the available character set considerably, there is significant overhead (wasted bandwidth) over 8bit-byte networks. For example, in uuencode and Base64, three bytes of data are encoded into four printable ASCII characters, which equals four bytes, a 33% overhead (not including the overhead from headers). yEnc uses one character (one byte) to represent one byte of the file, with a few exceptions.

yEnc assumes that binary data mostly can be transmitted through Usenet and email. Therefore, 252 of the 256 possible bytes are passed through unencoded as a single byte, whether that result is a printable ASCII character or not. Only NUL, LF, CR, and = are escaped. LF and CR are escaped because the RFCs that define Internet messages still require that carriage returns and line feeds have special meaning in a mail message. = is the escape character, so it itself is escaped. NUL is also escaped because of problems handling null characters in common code, although as an optimization yEnc adds 42 to every source byte so that, not uncommon, long stretches of zero bytes do not require a lot of escaping.

There is no RFC or other standards document describing yEnc. [4] The yEnc homepage contains a draft informal[ citation needed ] specification and a grammar (which contradict RFC   2822 and RFC   2045),[ citation needed ]although neither has been submitted to the Internet Engineering Task Force.[ citation needed ]

As with uuencoding, despite its flaws, yEnc remains[ when? ] active and effective on Usenet.[ citation needed ] The yEnc homepage states that "all major newsreaders have been extended to yEnc support". Microsoft's Outlook Express, Windows Mail and Windows Live Mail do not provide yEnc support for either news or mail, but there are plug-ins available. Mozilla Thunderbird will decode single-part yEnc files, but is not able to combine multi-part binaries. [5]

Problems

Many programmers and news admins have outlined the weaknesses of yEnc. [6] [7] [8] [9] It suffers from many of the same flaws as uuencode does, a number of which had already been solved years before by MIME (which addressed the same flaws in uuencode). For example, yEnc requires the strings "=ybegin" and "=yend" to be placed around the encoded file in the message body. [3] Although this is an improvement over uuencode's "begin" and "end", which occur more frequently in normal text, message readers can still encounter the strings outside of attachments (most frequently in discussions about yEnc itself). yEnc and uuencode[ citation needed ] also attempt to reassemble files split into multiple messages by using the subject line, which is unreliable.[ according to whom? ]

yEncode adoption

The yEncode draft proposal document was made available on 31 July 2001. [10] A reference encoder and decoder was included in the MyNews 1.9 freeware version in November that year. [11] yDec, a freeware win32 decoder came on 14 November 2001. On 21 March 2002, Agent supported yEnc with version 1.91. [12] [13] Due to feedback of Juergen Helbing, the release was postponed by one week. [14] [15] A couple of days after the release Jürgen Helbing wrote that Forté implemented yEnc in the best way imaginable. [16]

Stuffit Deluxe added yEnc support with version 8.0 in 2003. [17] [18] PowerArchiver 9.2 added yEnc support in May 2005. [19]

Related Research Articles

<span class="mw-page-title-main">ASCII</span> American character encoding standard

ASCII, abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because of technical limitations of computer systems at the time it was invented, ASCII has just 128 code points, of which only 95 are printable characters, which severely limited its scope. Modern computer systems have evolved to use Unicode, which has millions of code points, but the first 128 of these are the same as the ASCII set.

In computing and telecommunication, a control character or non-printing character (NPC) is a code point in a character set that does not represent a written character or symbol. They are used as in-band signaling to cause effects other than the addition of a symbol to the text. All other characters are mainly graphic characters, also known as printing characters, except perhaps for "space" characters. In the ASCII standard there are 33 control characters, such as code 7, BEL, which rings a terminal bell.

Multipurpose Internet Mail Extensions (MIME) is an Internet standard that extends the format of email messages to support text in character sets other than ASCII, as well as attachments of audio, video, images, and application programs. Message bodies may consist of multiple parts, and header information may be specified in non-ASCII character sets. Email messages with MIME formatting are typically transmitted with standard protocols, such as the Simple Mail Transfer Protocol (SMTP), the Post Office Protocol (POP), and the Internet Message Access Protocol (IMAP).

The Simple Mail Transfer Protocol (SMTP) is an Internet standard communication protocol for electronic mail transmission. Mail servers and other message transfer agents use SMTP to send and receive mail messages. User-level email clients typically use SMTP only for sending messages to a mail server for relaying, and typically submit outgoing email to the mail server on port 587 or 465 per RFC 8314. For retrieving messages, IMAP is standard, but proprietary servers also often implement proprietary protocols, e.g., Exchange ActiveSync.

8-bit clean is an attribute of computer systems, communication channels, and other devices and software, that process 8-bit character encodings without treating any byte as an in-band control code.

In computer programming, Base64 is a group of binary-to-text encoding schemes that transforms binary data into a sequence of printable characters, limited to a set of 64 unique characters. More specifically, the source binary data is taken 6 bits at a time, then this group of 6 bits is mapped to one of 64 unique characters.

UTF-7 is an obsolete variable-length character encoding for representing Unicode text using a stream of ASCII characters. It was originally intended to provide a means of encoding Unicode text for use in Internet E-mail messages that was more efficient than the combination of UTF-8 with quoted-printable.

uuencoding is a form of binary-to-text encoding that originated in the Unix programs uuencode and uudecode written by Mary Ann Horton at the University of California, Berkeley in 1980, for encoding binary data for transmission in email systems.

Quoted-Printable, or QP encoding, is a binary-to-text encoding system using printable ASCII characters to transmit 8-bit data over a 7-bit data path or, generally, over a medium which is not 8-bit clean. Historically, because of the wide range of systems and protocols that could be used to transfer messages, e-mail was often assumed to be non-8-bit-clean – however, modern SMTP servers are in most cases 8-bit clean and support 8BITMIME extension. It can also be used with data that contains non-permitted octets or line lengths exceeding SMTP limits. It is defined as a MIME content transfer encoding for use in e-mail.

An email attachment is a computer file sent along with an email message. One or more files can be attached to any email message, and be sent along with it to the recipient. This is typically used as a simple method to share documents and images.

BinHex, originally short for "binary-to-hexadecimal", is a binary-to-text encoding system that was used on the classic Mac OS for sending binary files through e-mail. Originally a hexadecimal encoding, subsequent versions of BinHex are more similar to uuencode, but combined both "forks" of the Mac file system together along with extended file information. BinHexed files take up more space than the original files, but will not be corrupted by non-"8-bit clean" software.

<span class="mw-page-title-main">Binary file</span> Non-human-readable computer file encoded in binary form

A binary file is a computer file that is not a text file. The term "binary file" is often used as a term meaning "non-text file". Many binary file formats contain parts that can be interpreted as text; for example, some computer document files containing formatted text, such as older Microsoft Word document files, contain the text of the document but also contain formatting information in binary form.

NZB is an XML-based file format for retrieving posts from NNTP (Usenet) servers. The format was conceived by the developers of the Newzbin.com Usenet Index. NZB is effective when used with search-capable websites. These websites create NZB files out of what is needed to be downloaded. Using this concept, headers would not be downloaded hence the NZB method is quicker and more bandwidth-efficient than traditional methods.

Ascii85, also called Base85, is a form of binary-to-text encoding developed by Paul E. Rutter for the btoa utility. By using five ASCII characters to represent four bytes of binary data, it is more efficient than uuencode or Base64, which use four characters to represent three bytes of data.

URL encoding, officially known as percent-encoding, is a method to encode arbitrary data in a uniform resource identifier (URI) using only the US-ASCII characters legal within a URI. Although it is known as URL encoding, it is also used more generally within the main Uniform Resource Identifier (URI) set, which includes both Uniform Resource Locator (URL) and Uniform Resource Name (URN). As such, it is also used in the preparation of data of the application/x-www-form-urlencoded media type, as is often used in the submission of HTML form data in HTTP requests.

This article compares Unicode encodings. Two situations are considered: 8-bit-clean environments, and environments that forbid use of byte values that have the high bit set. Originally such prohibitions were to allow for links that used only seven data bits, but they remain in some standards and so some standard-conforming software must generate messages that comply with the restrictions. Standard Compression Scheme for Unicode and Binary Ordered Compression for Unicode are excluded from the comparison tables because it is difficult to simply quantify their size.

The HZ character encoding is an encoding of GB 2312 that was formerly commonly used in email and USENET postings. It was designed in 1989 by Fung Fung Lee of Stanford University, and subsequently codified in 1995 into RFC 1843.

A binary-to-text encoding is encoding of data in plain text. More precisely, it is an encoding of binary data in a sequence of printable characters. These encodings are necessary for transmission of data when the communication channel does not allow binary data or is not 8-bit clean. PGP documentation uses the term "ASCII armor" for binary-to-text encoding when referring to Base64.

yProxy is a Network News Transfer Protocol (NNTP) proxy server for the Windows operating system. yProxy's main function is to convert yEnc-encoded attachments to UUE-encoded attachments on the fly. The main purpose of this is to add functionality to NNTP newsreaders that do not have native support for yEnc. The inventor of yEnc recommends yProxy for use by Windows users whose newsreaders do not support yEnc decoding.

<span class="mw-page-title-main">Usenet</span> Worldwide computer-based distributed discussion system

Usenet, USENET, or "in full", User's Network, is a worldwide distributed discussion system available on computers. It was developed from the general-purpose Unix-to-Unix Copy (UUCP) dial-up network architecture. Tom Truscott and Jim Ellis conceived the idea in 1979, and it was established in 1980. Users read and post messages to one or more topic categories, known as newsgroups. Usenet resembles a bulletin board system (BBS) in many respects and is the precursor to the Internet forums that have become widely used. Discussions are threaded, as with web forums and BBSes, though posts are stored on the server sequentially.

References

  1. Helbing, Juergen (28 February 2002). "yEncode - A quick and dirty encoding for binaries" . Retrieved 26 July 2014.
  2. Fellows, G. (2006). "Newsgroups reborn – The binary posting renaissance". Digital Investigation. 3 (2): 73–78. doi:10.1016/j.diin.2006.04.006. ISSN   1742-2876.
  3. 1 2 Kim, Juhoon; Schneider, Fabian; Ager, Bernhard; Feldmann, Anja (2010). "Today's Usenet Usage: NNTP Traffic Characterization". 2010 INFOCOM IEEE Conference on Computer Communications Workshops. pp. 1–6. CiteSeerX   10.1.1.679.6023 . doi:10.1109/INFCOMW.2010.5466665. ISBN   978-1-4244-6739-6. S2CID   18282467.
  4. High Definition: An A to Z Guide to Personal Technology . 2006. p. 353. ISBN   978-0618714896. While there are no official standards for yEnc, it is widely used for posting binary files on newsgroups
  5. "Yenc support in Thunderbird?". org.mozilla.lists.support-thunderbird (Mailing list). 18 May 2006.
  6. Helbing, Jürgen (10 June 2002). "Opponents to yEnc". yenc.org. Archived from the original on 28 August 2013.
  7. Nixon, Jeremy (17 March 2002). "Why yEnc is bad for Usenet". Archived from the original on 29 August 2013.
  8. Welch, Curt (19 September 2002). "What's wrong with yEnc?". Archived from the original on 11 August 2013.
  9. Färber, Claus (4 March 2002). "yEnc considered harmful". Archived from the original on 1 June 2009.
  10. "The original draft yEnc 1.0". 31 July 2001.
  11. Juergen "The Archiver" Helbing. "New features in 1.9". winews.net.
  12. "Agent 1.91 is Released". forteinc.com. Agent 1.91 provides full support for yEnc, a new Usenet encoding algorithm for binaries.
  13. Spanbauer, Scott (August 2002). "Revision control - Latest Software Tweaks (Listen to a world of radio stations on the Internet)". PC World. 20 (8): 138–139. Version 1.92 of Forté's Usenet newsreader adds a trash folder, improves some existing features, and takes care of various bugs; but more important than the fixes and enhancements is the application's added support for the YEnc binary encoding algorithm.
  14. "Agent 1.91 needs one more week". Forté. 15 March 2002.
  15. "Juergen Helbing's feedback on yEnc and Agent 1.91". Forté. 17 March 2002.
  16. Helbing, Jürgen (22 March 2002). "Forte Agent 1.91 supports yEnc".
  17. Sellers, Dennis (22 September 2003). "StuffIt Deluxe 8.0 gets new plug-ins, performance boost". Macworld .
  18. Breen, Christopher (July 2004). "Stufflt Deluxe 8.0". Macworld. 21 (7): 40.
  19. Richard V. Dragan (4 May 2005). "File Compression: PowerArchiver 9.2".