PackBits

Last updated

PackBits is a fast, simple lossless compression scheme for run-length encoding of data.

Apple introduced the PackBits format with the release of MacPaint on the Macintosh computer. This compression scheme can be used in TIFF files. TGA files also use this RLE compression scheme, but treats data stream as pixels instead of bytes.

A PackBits data stream consists of packets with a one-byte header followed by data. The header is a signed byte; the data can be signed, unsigned, or packed (such as MacPaint pixels).

In the following table, n is the value of the header byte as a signed integer.

Header byteData following the header byte
0 to 127(1 + n) literal bytes of data
−1 to −127One byte of data, repeated (1 − n) times in the decompressed output
−128No operation (skip and treat next byte as a header byte)

Note that interpreting 0 as positive or negative makes no difference in the output. Runs of two bytes adjacent to non-runs are typically written as literal data. There is no way based on the PackBits data to determine the end of the data stream; that is to say, one must already know the size of the compressed or uncompressed data before reading a PackBits data stream to know where it ends.

Apple Computer (see the external link) provides this short example of packed data: FE AA 02 80 00 2A FD AA 03 80 00 2A 22 F7 AA

The following code, written in Microsoft VBA, unpacks the data:

SubUnpackBitsDemo()DimFileAsVariantDimMyOutputAsStringDimCountAsLongDimiAsLong,jAsLongFile="FE AA 02 80 00 2A FD AA 03 80 00 2A 22 F7 AA"File=Split(File," ")Fori=LBound(File)ToUBound(File)Count=Application.WorksheetFunction.Hex2Dec(File(i))SelectCaseCountCaseIs>=128Count=256-Count'Two's ComplementForj=0ToCount'zero-basedMyOutput=MyOutput&File(i+1)&" "Nextji=i+1'Adjust the pointerCaseElseForj=0ToCount'zero-basedMyOutput=MyOutput&File(i+j+1)&" "Nextji=i+j'Adjust the pointerEndSelectNextiDebug.PrintMyOutput'AA AA AA 80 00 2A AA AA AA AA 80 00 2A 22 AA AA AA AA AA AA AA AA AA AAEndSub

The same implementation in JavaScript:

/** * Helper functions to create readable input and output *  * Also, see this fiddle for interactive PackBits decoder: * https://jsfiddle.net/y13xkh65/3/ */functionstr2hex(str){returnstr.split('').map(function(char){varvalue=char.charCodeAt(0);return((value<16?'0':'')+value.toString(16)).toUpperCase();}).join(' ');}functionhex2str(hex){returnhex.split(' ').map(function(string){returnString.fromCharCode(parseInt(string,16));}).join('');}/** * PackBits unpack function *  * @param {String} data * @return {String} */functionunpackBits(data){varoutput='',i=0;while(i<data.length){varhex=data.charCodeAt(i);if(hex==128){// Do nothing, nop}elseif(hex>128){// This is a repeated bytehex=256-hex;for(varj=0;j<=hex;++j){output+=data.charAt(i+1);}++i;}else{// These are literal bytesfor(varj=0;j<=hex;++j){output+=data.charAt(i+j+1);}i+=j;}++i;}returnoutput;}varoriginal='FE AA 02 80 00 2A FD AA 03 80 00 2A 22 F7 AA',data=unpackBits(hex2str(original));// Output is: AA AA AA 80 00 2A AA AA AA AA 80 00 2A 22 AA AA AA AA AA AA AA AA AA AAconsole.log(str2hex(data));

Related Research Articles

GIF Bitmap image file format family

The Graphics Interchange Format is a bitmap image format that was developed by a team at the online services provider CompuServe led by American computer scientist Steve Wilhite and released on 15 June 1987. It has since come into widespread usage on the World Wide Web due to its wide support and portability between applications and operating systems.

The MD5 message-digest algorithm is a cryptographically broken but still widely used hash function producing a 128-bit hash value. Although MD5 was initially designed to be used as a cryptographic hash function, it has been found to suffer from extensive vulnerabilities. It can still be used as a checksum to verify data integrity, but only against unintentional corruption. It remains suitable for other non-cryptographic purposes, for example for determining the partition for a particular key in a partitioned database, and may be preferred due to lower computational requirements than more recent Secure Hash Algorithms.

Lempel–Ziv–Welch (LZW) is a universal lossless data compression algorithm created by Abraham Lempel, Jacob Ziv, and Terry Welch. It was published by Welch in 1984 as an improved implementation of the LZ78 algorithm published by Lempel and Ziv in 1978. The algorithm is simple to implement and has the potential for very high throughput in hardware implementations. It is the algorithm of the widely used Unix file compression utility compress and is used in the GIF image format.

TIFF Series of image file formats

Tag Image File Format, abbreviated TIFF or TIF, is an image file format for storing raster graphics images, popular among graphic artists, the publishing industry, and photographers. TIFF is widely supported by scanning, faxing, word processing, optical character recognition, image manipulation, desktop publishing, and page-layout applications. The format was created by the Aldus Corporation for use in desktop publishing. It published the latest version 6.0 in 1992, subsequently updated with an Adobe Systems copyright after the latter acquired Aldus in 1994. Several Aldus or Adobe technical notes have been published with minor extensions to the format, and several specifications have been based on TIFF 6.0, including TIFF/EP, TIFF/IT, TIFF-F and TIFF-FX.

In computer programming, the term magic number has multiple meanings. It could refer to one of the following:

uuencoding is a form of binary-to-text encoding that originated in the Unix programs uuencode and uudecode written by Mary Ann Horton at UC Berkeley in 1980, for encoding binary data for transmission in email systems.

In classic Mac OS System 7 and later, and in macOS, an alias is a small file that represents another object in a local, remote, or removable file system and provides a dynamic link to it; the target object may be moved or renamed, and the alias will still link to it. In Windows, a "shortcut", a file with a .lnk extension, performs a similar function.

A dictionary coder, also sometimes known as a substitution coder, is a class of lossless data compression algorithms which operate by searching for matches between the text to be compressed and a set of strings contained in a data structure maintained by the encoder. When the encoder finds such a match, it substitutes a reference to the string's position in the data structure.

PAQ is a series of lossless data compression archivers that have gone through collaborative development to top rankings on several benchmarks measuring compression ratio. Specialized versions of PAQ have won the Hutter Prize and the Calgary Challenge. PAQ is free software distributed under the GNU General Public License.

Color BASIC is the implementation of Microsoft BASIC that is included in the ROM of the Tandy/Radio Shack TRS-80 Color Computers manufactured between 1980 and 1991. BASIC is a high level language with simple syntax that makes it easy to write simple programs. Color BASIC is interpreted, that is, decoded as it is run.

Intel hexadecimal object file format, Intel hex format or Intellec Hex is a file format that conveys binary information in ASCII text form. It is commonly used for programming microcontrollers, EPROMs, and other types of programmable logic devices and hardware emulators. In a typical application, a compiler or assembler converts a program's source code to machine code and outputs it into a HEX file. Some also use it as a container format holding packets of stream data. Common file extensions used for the resulting files are .HEX or .H86. The HEX file is then read by a programmer to write the machine code into a PROM or is transferred to the target system for loading and execution.

The Apple Icon Image format is an icon format used in Apple Inc.'s macOS. It supports icons of 16 × 16, 32 × 32, 48 × 48, 128 × 128, 256 × 256, 512 × 512 points at 1x and 2x scale, with both 1- and 8-bit alpha channels and multiple image states. The fixed-size icons can be scaled by the operating system and displayed at any intermediate size.

Silicon Graphics Image (SGI) or the RGB file format is the native raster graphics file format for Silicon Graphics workstations. The format was invented by Paul Haeberli. It can be run-length encoded (RLE). FFmpeg and ImageMagick, among others, support this format.

This is an overview of Fortran 95 language features. Included are the additional features of TR-15581:Enhanced Data Type Facilities, which have been universally implemented. Old features that have been superseded by new ones are not described – few of those historic features are used in modern programs although most have been retained in the language to maintain backward compatibility. The current standard is Fortran 2018; many of its new features are still being implemented in compilers. The additional features of Fortran 2003, Fortran 2008 and Fortran 2018 are described by Metcalf, Reid and Cohen.

SREC (file format)

Motorola S-record is a file format, created by Motorola in the mid-1970s, that conveys binary information as hex values in ASCII text form. This file format may also be known as SRECORD, SREC, S19, S28, S37. It is commonly used for programming flash memory in microcontrollers, EPROMs, EEPROMs, and other types of programmable logic devices. In a typical application, a compiler or assembler converts a program's source code to machine code and outputs it into a HEX file. The HEX file is then imported by a programmer to "burn" the machine code into non-volatile memory, or is transferred to the target system for loading and execution.

In computing, a bitmap is a mapping from some domain to bits. It is also called a bit array or bitmap index.

ZPAQ is an open source command line archiver for Windows and Linux. It uses a journaling or append-only format which can be rolled back to an earlier state to retrieve older versions of files and directories. It supports fast incremental update by adding only files whose last-modified date has changed since the previous update. It compresses using deduplication and several algorithms depending on the data type and the selected compression level. To preserve forward and backward compatibility between versions as the compression algorithm is improved, it stores the decompression algorithm in the archive. The ZPAQ source code includes a public domain API, libzpaq, which provides compression and decompression services to C++ applications. The format is believed to be unencumbered by patents.

BLAKE is a cryptographic hash function based on Daniel J. Bernstein's ChaCha stream cipher, but a permuted copy of the input block, XORed with round constants, is added before each ChaCha round. Like SHA-2, there are two variants differing in the word size. ChaCha operates on a 4×4 array of words. BLAKE repeatedly combines an 8-word hash value with 16 message words, truncating the ChaCha result to obtain the next hash value. BLAKE-256 and BLAKE-224 use 32-bit words and produce digest sizes of 256 bits and 224 bits, respectively, while BLAKE-512 and BLAKE-384 use 64-bit words and produce digest sizes of 512 bits and 384 bits, respectively.

The write is one of the most basic routines provided by a Unix-like operating system kernel. It writes data from a buffer declared by the user to a given device, such as a file. This is the primary way to output data from a program by directly using a system call. The destination is identified by a numeric code. The data to be written, for instance a piece of text, is defined by a pointer and a size, given in number of bytes.

Although C++ is one of the most widespread programming languages, many prominent software engineers criticize C++ for being overly complex, and fundamentally flawed. C++ was widely adopted, and implemented, as a systems language through most of its existence, it has been used to build many pieces of very important software.