BinHex

Last updated
BinHex 4
Filename extension
.hqx
Internet media type
application/mac-binhex40
application/mac-binhex
application/binhex
Uniform Type Identifier (UTI) com.apple.binhex-archive

BinHex, originally short for "binary-to-hexadecimal", is a binary-to-text encoding system that was used on the classic Mac OS for sending binary files through e-mail. Originally a hexadecimal encoding, subsequent versions of BinHex are more similar to uuencode, but combined both "forks" of the Mac file system together along with extended file information. BinHexed files take up more space than the original files, but will not be corrupted by non-"8-bit clean" software.

Contents

History

TRS-80 BinHex (.hex)

BinHex was originally written in 1981 by Tim Mann for the TRS-80 as a standalone version of an encoding scheme originally built into a popular terminal emulator, ST80-III by Lance Micklus. BinHex was used for sending files via major online services, such as CompuServe, which were not "8-bit clean" and required ASCII armoring to survive. Not everyone used ST-80, however, so Mann wrote BinHex to allow users of other terminals to use the format. [1]

The original ST-80 system worked by converting the binary file contents to hexadecimal numbers, which were encoded as ASCII digits and letters (09, AF). It then added a newline after every 60 characters. The system became very popular after Mann uploaded it to CompuServe's TRS-80 files area. The system quickly gained the addition of a checksum at the end of every line to check for errors. Bill Stockwell converted that version to the BASIC/S compiler, which ran much faster than Mann's interpreted version. [1]

BinHex files of the era were typically given the file extension .hex. Ports soon appeared for other popular platforms of the era, including the Apple II. CompuServe later added support for 8-bit transfers, and the format quickly disappeared. [1]

Mac BinHex (.hex)

The file upload problem still existed on CompuServe when the Mac was first released in 1984. In April 1984, William Davis ported BinHex to the Mac using Microsoft BASIC to produce a version that was largely identical to the TRS-80 versions of the same era. [1] This version only supported encoding of the "data fork", ignoring the resource fork, which meant it could only be used for data files. The rise in use of Internet e-mail coincided roughly with the release of the Macintosh, and Davis's version was posted on the Info-Mac mailing list by Joel Heller in June 1984. Several newer versions were published during 1984, resulting in BinHex 3 that could encode both forks.

Yves Lempereur, author of the first assembler for the Mac, MacASM, found that in order to upload his files to CompuServe he had to use BinHex. The BASIC version was very slow, so Lempereur ported BinHex 3 to assembler and released it as BinHex 1.0. The program was roughly a hundred times as fast as the BASIC version, and soon upgrade requests were flooding in. [2]

Compact BinHex (.hcx)

The original BinHex was a fairly simple format, one that was not very efficient because it expanded every byte of input into two, as required by the hexadecimal representation—an 8-to-4 bit encoding. For BinHex 2.0, Lempereur used a new 8-to-6 encoding that decreased file size by 50%. He also took the opportunity to expand the checksum from 8 to 16-bits. [2]

This new encoding used the first 64 ASCII printing characters, including the space, to represent the data, [3] similarly to uuencode. Even though the new encoding was no longer hexadecimal in nature, the established name of the program was retained. The smaller files were incompatible with the older ones, so the extension became .hcx, c for compact. The new version replaced the earlier ones "overnight". [2]

BinHex 4 (.hqx)

Lempereur had concerns about some of the features of BinHex, notably its use of a checksum instead of a cyclic redundancy check (CRC) and the fact that the metadata information in the header was in plain text and thus could be corrupted in the same way as the data. [2]

In order to solve all of these problems, Lempereur released BinHex 4.0 in 1985, skipping 3.0 to avoid confusion with the now long-dead BASIC version. 4.0 first combined the data fork, resource fork and file metadata into a common 8-bit format, ran run-length encoding (RLE) on the result to provide some compression, and then ran the 8->6 conversion on the result and protected everything with multiple CRCs. The resulting .hqx files were roughly the same size of the .hcx's, but much more robust. [2]

BinHex 5

At about the time BinHex 4 was released, most online services started supporting robust 8-bit file transfer protocols such as ZMODEM, and the need for ASCII armoring went away. This left a problem on the Mac, however, as there was still the need to encode the two forks into one.

A team effort among Macintosh communications programmers, including Lempereur, resulted in MacBinary. These .bin files left the contents of the forks in their original 8-bit format and added a simple header for combining them on reception; MacBinary files were thus much smaller than BinHex. Lempereur released BinHex 5.0, almost identical to 4.0 with the exception that it used MacBinary to combine the forks before running the 8-to-6 encoding. This saw little use, as he expected. [2]

On the Internet, e-mail was still the primary method of moving files. At the time, relatively few people had full access to the Internet, and services like FTPmail were the only way many users could download files. Years later when he first got onto the Internet, Lempereur was surprised to find that BinHex 4.0 was still extremely popular. [2]

The same ends could be achieved by first using MacBinary or AppleSingle to combine the forks, and then using Uuencode or Base64 on the resulting file, but none of these solutions ever became popular and BinHex 4.0 survived well into the late 1990s. File archives of classic Mac OS software are still filled with BinHexed files.

BinHex 4 file format

Looking at the contents of a BinHex file, one will notice that it has a message usually on the first line identifying it as BinHex, followed by many 64-character lines made up of seemingly random letters, numbers, and punctuation marks. Here is a sample of what BinHex actually looks like:

(This file must be converted with BinHex 4.0)  :$f*TEQKPH#jdCA0d,R0TG!"6594%8dP8)3#3"!&m!*!%EMa6593K!!%!!!&mFNa KG3,r!*!$&[rr$3d,BQPZD'9i,R4PFh3!RQ+!!"AV#J#3!i!!N!@QKUjrU!#3'[q 3"&4&@&483N)f!3#Xaj6bV-H8mJ!!!B3!N!0"!*!$[3#3!cR@iiY)!*!'[I%4!!J Fp$X%X3@J!mZE6!GRiKUi$HGKMf0U61S46%i1"AB!TI,fLl!d1X3RDDE8ALfTCbM 8UP9p4iUqY-0k4krHpk9XK@`rbj2Ti'U@5rGH@+[fr-i4T6-qXpfl26,k!H5$Nml TIkI'(l3GI4)f8mII&01CNEbC2LrNLBeaZ1HG@$G8!Z6"k)hh,q9p"r6FC*!!Se" (ic,Pd(4(b`pflKC`H1&JN5)GVX3mREdH55[l`%`Yhp%q092c`A(hPV)!83Dr&f4 $$L#I1aM-"VjqV-q$34KQq6$M$f8#,Zc,i),!(`*ZN!$K$rS!LA%3cL+dYi"@,K( Z"`#3!fKi!!!: 

There must be a text line, which is used by users and tools to recognize BinHex versions: (This file must be converted with BinHex 4.0). Any text before this line is to be ignored. [4]

The rest of the file consists of three parts, a header (containing file name, size etc.), a data fork (containing the file data) and a resource fork. Each has a two-byte CRC checksum.

Everything except the (This file... line is then seen as an area of binary data, which is encoded to ASCII characters. The encoding algorithm says that three bytes input are divided into four 6-bit values, in a way similar to the way in which Base64 does. Number 0–63 are given characters according to the following list !"#$%&'()*+,-012345689@ABCDEFGHIJKLMNPQRSTUVXYZ[`abcdefhijklmpqr

When encoding, a <return> should be inserted after every 64 characters. After encoding, a colon is placed before and after the data.

Related Research Articles

In mathematics and computing, the hexadecimal numeral system is a positional numeral system that represents numbers using a radix (base) of sixteen. Unlike the decimal system representing numbers using ten symbols, hexadecimal uses sixteen distinct symbols, most often the symbols "0"–"9" to represent values 0 to 9, and "A"–"F" to represent values from ten to fifteen.

<span class="mw-page-title-main">Plain text</span> Term for computer data consisting only of unformatted characters of readable material

In computing, plain text is a loose term for data that represent only characters of readable material but not its graphical representation nor other objects. It may also include a limited number of "whitespace" characters that affect simple arrangement of text, such as spaces, line breaks, or tabulation characters. Plain text is different from formatted text, where style information is included; from structured text, where structural parts of the document such as paragraphs, sections, and the like are identified; and from binary files in which some portions must be interpreted as binary objects.

8-bit clean is an attribute of computer systems, communication channels, and other devices and software, that process 8-bit character encodings without treating any byte as an in-band control code.

In computer programming, Base64 is a group of tetrasexagesimal binary-to-text encoding schemes that represent binary data in sequences of 24 bits that can be represented by four 6-bit Base64 digits.

uuencoding is a form of binary-to-text encoding that originated in the Unix programs uuencode and uudecode written by Mary Ann Horton at the University of California, Berkeley in 1980, for encoding binary data for transmission in email systems.

yEnc is a binary-to-text encoding scheme for transferring binary files in messages on Usenet or via e-mail. It reduces the overhead over previous US-ASCII-based encoding methods by using an 8-bit encoding method. yEnc's overhead is often as little as 1–2%, compared to 33–40% overhead for 6-bit encoding methods like uuencode and Base64. yEnc was initially developed by Jürgen Helbing, and its first release was early 2001. By 2003 yEnc became the de facto standard encoding system for binary files on Usenet. The name yEncode is a wordplay on "Why encode?", since the idea is to only encode characters if it is absolutely required to adhere to the message format standard.

An email attachment is a computer file sent along with an email message. One or more files can be attached to any email message, and be sent along with it to the recipient. This is typically used as a simple method to share documents and images.

<span class="mw-page-title-main">Binary file</span> Non-human-readable computer file encoded in binary form

A binary file is a computer file that is not a text file. The term "binary file" is often used as a term meaning "non-text file". Many binary file formats contain parts that can be interpreted as text; for example, some computer document files containing formatted text, such as older Microsoft Word document files, contain the text of the document but also contain formatting information in binary form.

MacBinary is a file format that combines the two forks of a classic Mac OS file into a single file, along with HFS's extended metadata. The resulting file is suitable for transmission over FTP, the World Wide Web, and electronic mail. The documents can also be stored on computers that run operating systems with no HFS support, such as Unix or Windows.

A hex editor is a computer program that allows for manipulation of the fundamental binary data that constitutes a computer file. The name 'hex' comes from 'hexadecimal', a standard numerical format for representing binary data. A typical computer file occupies multiple areas on the storage medium, whose contents are combined to form the file. Hex editors that are designed to parse and edit sector data from the physical segments of floppy or hard disks are sometimes called sector editors or disk editors.

Ascii85, also called Base85, is a form of binary-to-text encoding developed by Paul E. Rutter for the btoa utility. By using five ASCII characters to represent four bytes of binary data, it is more efficient than uuencode or Base64, which use four characters to represent three bytes of data.

Netpbm is an open-source package of graphics programs and a programming library. It is used mainly in the Unix world, where one can find it included in all major open-source operating system distributions, but also works on Microsoft Windows, macOS, and other operating systems.

In the macOS, iOS, NeXTSTEP, and GNUstep programming frameworks, property list files are files that store serialized objects. Property list files use the filename extension .plist, and thus are often referred to as p-list files.

A binary-to-text encoding is encoding of data in plain text. More precisely, it is an encoding of binary data in a sequence of printable characters. These encodings are necessary for transmission of data when the communication channel does not allow binary data or is not 8-bit clean. PGP documentation uses the term "ASCII armor" for binary-to-text encoding when referring to Base64.

Intel hexadecimal object file format, Intel hex format or Intellec Hex is a file format that conveys binary information in ASCII text form, making it possible to store on non-binary media such as paper tape, punch cards, etc., to display on text terminals or be printed on line-oriented printers. The format is commonly used for programming microcontrollers, EPROMs, and other types of programmable logic devices and hardware emulators. In a typical application, a compiler or assembler converts a program's source code to machine code and outputs it into a HEX file. Some also use it as a container format holding packets of stream data. Common file extensions used for the resulting files are .HEX or .H86. The HEX file is then read by a programmer to write the machine code into a PROM or is transferred to the target system for loading and execution.

<span class="mw-page-title-main">SREC (file format)</span> File format developed by Motorola

Motorola S-record is a file format, created by Motorola in the mid-1970s, that conveys binary information as hex values in ASCII text form. This file format may also be known as SRECORD, SREC, S19, S28, S37. It is commonly used for programming flash memory in microcontrollers, EPROMs, EEPROMs, and other types of programmable logic devices. In a typical application, a compiler or assembler converts a program's source code to machine code and outputs it into a HEX file. The HEX file is then imported by a programmer to "burn" the machine code into non-volatile memory, or is transferred to the target system for loading and execution.

<span class="mw-page-title-main">GNU Unifont</span> Duospaced bitmap font

GNU Unifont is a free Unicode bitmap font created by Roman Czyborra. The main Unifont covers all of the Basic Multilingual Plane (BMP). The "upper" companion covers significant parts of the Supplementary Multilingual Plane (SMP). The "Unifont JP" companion contains Japanese kanji present in the JIS X 0213 character set.

Tektronix hex format and Extended Tektronix hex format / Extended Tektronix Object Format are ASCII-based hexadecimal file formats, created by Tektronix, for conveying binary information for applications like programming microcontrollers, EPROMs, and other kinds of chips.

010 Editor is a commercial hex editor and text editor for Microsoft Windows, Linux and macOS. Typically 010 Editor is used to edit text files, binary files, hard drives, processes, tagged data, source code, shell scripts, log files, etc. A large variety of binary data formats can be edited through the use of Binary Templates.

The MOS Technology file format is a file format that conveys binary information in ASCII text form.

References

  1. 1 2 3 4 Mann.
  2. 1 2 3 4 5 6 7 Lempereur 1997.
  3. For example, the source code of the CWI version of hexbin included in macutils , in hecx.c line 187, uses the expression ((c)-0x20) & 0x3f to obtain the numerical value of an HCX digit with the ASCII value c.
  4. RFC   1741 MIME Content Type for BinHex Encoded Files. Faltstrom, P. & Crocker, D. & Fair., E. (December 1994).

Bibliography

See also