LZX

Last updated

LZX is an LZ77 family compression algorithm, a slightly improved version of DEFLATE. [1] It is also the name of a file archiver with the same name. Both were invented by Jonathan Forbes and Tomi Poutanen in the 1990s.

Contents

Instances of use of the LZX algorithm

Amiga LZX

LZX was publicly released as an Amiga file archiver in 1995, while the authors were studying at the University of Waterloo in Canada. The software was shareware, which was common for compression software at the time. The registered version contained fixes and improvements that were not available in the evaluation version. In 1997, the authors gave away a free keyfile, which allowed anyone to use the registered version, as they had stopped work on the archiver and stopped accepting registrations.

Microsoft Cabinet files

In 1996, Forbes went to work for Microsoft, [2] and Microsoft's cabinet archiver was enhanced to include the LZX compression method. Improvements included a variable search window size; Amiga LZX was fixed to 64 KB, and Microsoft LZX could range on powers of two between 32 and 2048 kilobytes (32,768 to 2,097,152 bytes). A special preprocessor was added to detect Intel 80x86 "CALL" instructions, converting their operands from relative addressing to absolute addressing, thus calls to the same location resulted in repeated strings that the compressor could match, improving compression of 80x86 binary code. (This technique is later generalized as Branch-Call-Jump [BCJ] filtering.)

Microsoft Compressed HTML Help (CHM) files

When Microsoft introduced Microsoft Compressed HTML Help, the replacement for their classic Help file format, they chose to compress all of the HTML data with the LZX algorithm. However, in order to improve random access speed, the compressor was altered to reset itself after every 64 kilobyte (65,536 bytes) interval and re-align to a 16-bit boundary after every 32 kilobyte interval. Thus, the HTMLHelp software could immediately seek the nearest 64 kilobyte interval and start decoding from there, rather than decoding from the beginning of the compressed datastream at all times.

Microsoft Reader (LIT) files

Microsoft LIT files for Microsoft Reader are simply an extension of the CHM file format, and thus also use LZX compression.

Windows Imaging Format (WIM) files

Windows Imaging Format, the installation/drive image file format of Windows Vista and Windows 7, uses LZX as one of the compression methods. [3]

CompactOS NTFS file compression

In Windows 10, LZX compression from Windows Imaging Format is used for the new CompactOS NTFS file compression.

Xbox Live Avatars

Microsoft uses LZX compression on Xbox Live Avatars to reduce their disk and bandwidth requirements. [4]

Decompressing LZX files

The unlzx program and XAD can unpack Amiga LZX archives. The cabextract program can unpack Microsoft cabinet files using the LZX method. [5] There are a multitude of cross-platform tools for decompiling or viewing CHM files, as stated in the CHM article. LIT files can be unpacked using the Convert LIT software. [6]

See also

Related Research Articles

A file archiver is a computer program that combines a number of files together into one archive file, or a series of archive files, for easier transportation or storage. File archivers may employ lossless data compression in their archive formats to reduce the size of the archive.

In information theory, data compression, source coding, or bit-rate reduction is the process of encoding information using fewer bits than the original representation. Any particular compression is either lossy or lossless. Lossless compression reduces bits by identifying and eliminating statistical redundancy. No information is lost in lossless compression. Lossy compression reduces bits by removing unnecessary or less important information. Typically, a device that performs data compression is referred to as an encoder, and one that performs the reversal of the process (decompression) as a decoder.

Lossless compression is a class of data compression that allows the original data to be perfectly reconstructed from the compressed data with no loss of information. Lossless compression is possible because most real-world data exhibits statistical redundancy. By contrast, lossy compression permits reconstruction only of an approximation of the original data, though usually with greatly improved compression rates.

Windows Media Audio (WMA) is a series of audio codecs and their corresponding audio coding formats developed by Microsoft. It is a proprietary technology that forms part of the Windows Media framework. WMA consists of four distinct codecs. The original WMA codec, known simply as WMA, was conceived as a competitor to the popular MP3 and RealAudio codecs. WMA Pro, a newer and more advanced codec, supports multichannel and high resolution audio. A lossless codec, WMA Lossless, compresses audio data without loss of audio fidelity. WMA Voice, targeted at voice content, applies compression using a range of low bit rates. Microsoft has also developed a digital container format called Advanced Systems Format to store audio encoded by WMA.

zlib DEFLATE codec library

zlib is a software library used for data compression as well as a data format. zlib was written by Jean-loup Gailly and Mark Adler and is an abstraction of the DEFLATE compression algorithm used in their gzip file compression program. zlib is also a crucial component of many software platforms, including Linux, macOS, and iOS. It has also been used in gaming consoles such as the PlayStation 4, PlayStation 3, Wii U, Wii, Xbox One and Xbox 360.

New Technology File System (NTFS) is a proprietary journaling file system developed by Microsoft. Starting with Windows NT 3.1, it is the default file system of the Windows NT family. It superseded File Allocation Table (FAT) as the preferred filesystem on Windows and is supported in Linux and BSD as well. NTFS reading and writing support is provided using a free and open-source kernel implementation known as NTFS3 in Linux and the NTFS-3G driver in BSD. By using the convert command, Windows can convert FAT32/16/12 into NTFS without the need to rewrite all files. NTFS uses several files typically hidden from the user to store metadata about other files stored on the drive which can help improve speed and performance when reading data. Unlike FAT and High Performance File System (HPFS), NTFS supports access control lists (ACLs), filesystem encryption, transparent compression, sparse files and file system journaling. NTFS also supports shadow copy to allow backups of a system while it is running, but the functionality of the shadow copies varies between different versions of Windows.

OpenEXR is a high-dynamic range, multi-channel raster file format, released as an open standard along with a set of software tools created by Industrial Light & Magic (ILM), under a free software license similar to the BSD license.

RAR is a proprietary archive file format that supports data compression, error correction and file spanning. It was developed in 1993 by Russian software engineer Eugene Roshal and the software is licensed by win.rar GmbH. The name RAR stands for Roshal Archive.

Cabinet is an archive-file format for Microsoft Windows that supports lossless data compression and embedded digital certificates used for maintaining archive integrity. Cabinet files have .cab filename extensions and are recognized by their first four bytes MSCF. Cabinet files were known originally as Diamond files.

<span class="mw-page-title-main">7-Zip</span> Open-source file archiver

7-Zip is a free and open-source file archiver, a utility used to place groups of files within compressed containers known as "archives". It is developed by Igor Pavlov and was first released in 1999. 7-Zip has its own archive format called 7z, but can read and write several others.

7z is a compressed archive file format that supports several different data compression, encryption and pre-processing algorithms. The 7z format initially appeared as implemented by the 7-Zip archiver. The 7-Zip program is publicly available under the terms of the GNU Lesser General Public License. The LZMA SDK 4.62 was placed in the public domain in December 2008. The latest stable version of 7-Zip and LZMA SDK is version 22.01.

<span class="mw-page-title-main">LHA (file format)</span>

LHA or LZH is a freeware compression utility and associated file format. It was created in 1988 by Haruyasu Yoshizaki, a doctor and originally named LHarc. A complete rewrite of LHarc, tentatively named LHx, was eventually released as LH. It was then renamed to LHA to avoid conflicting with the then-new MS-DOS 5.0 LH command. The original LHA and its Windows port, LHA32, are no longer in development because Yoshizaki is busy at work.

Microsoft Compiled HTML Help is a Microsoft proprietary online help format, consisting of a collection of HTML pages, an index and other navigation tools. The files are compressed and deployed in a binary format with the extension .CHM, for Compiled HTML. The format is often used for software documentation.

An image file format is a file format for a digital image. There are many formats that can be used, such as JPEG, PNG, and GIF. Most formats up until 2022 were for storing 2D images, not 3D ones. The data stored in an image file format may be compressed or uncompressed. If the data is compressed, it may be done so using lossy compression or lossless compression. For graphic design applications, vector formats are often used. Some image file formats support transparency.

The Windows Imaging Format (WIM) is a file-based disk image format. It was developed by Microsoft to help deploy Windows Vista and subsequent versions of the Windows operating system family, as well as Windows Fundamentals for Legacy PCs.

<span class="mw-page-title-main">The Unarchiver</span> File decompression utility

The Unarchiver is a proprietary freeware data decompression utility, which supports more formats than Archive Utility, the built-in archive unpacker program in macOS. It can also handle filenames in various character encodings, created using operating system versions that use those character encodings. The latest version requires Mac OS X Lion or higher. The Unarchiver does not compress files.

XZ Utils is a set of free software command-line lossless data compressors, including the programs lzma and xz, for Unix-like operating systems and, from version 5.0 onwards, Microsoft Windows. For compression/decompression the Lempel–Ziv–Markov chain algorithm (LZMA) is used. XZ Utils started as a Unix port of Igor Pavlov's LZMA-SDK that has been adapted to fit seamlessly into Unix environments and their usual structure and behavior.

Brotli is a lossless data compression algorithm developed by Google. It uses a combination of the general-purpose LZ77 lossless compression algorithm, Huffman coding and 2nd-order context modelling. Brotli is primarily used by web servers and content delivery networks to compress HTTP content, making internet websites load faster. A successor to gzip, it is supported by all major web browsers and has become increasingly popular, as it provides better compression than gzip.

In data compression, BCJ, short for Branch/Call/Jump, refers to a technique that improves the compression of machine code by replacing relative branch addresses with absolute ones. This allows a Lempel–Ziv compressor to identify duplicate targets and more efficiently encode them. On decompression, the inverse filter restores the original encoding. Different BCJ filters are used for different instruction sets, as each use different opcodes for branching.

References

  1. [wimlib: the open source Windows Imaging (WIM) library - Compression algorithm https://wimlib.net/compression.html]
  2. "Jonathan Forbes - LinkedIn". Archived from the original on 2010-03-23.
  3. "APC Magazine » Build your own Vista install DVD". Archived from the original on 2006-08-19. Retrieved 2006-08-19.
  4. "Xbox.com | Engineering Blog - Xbox Engineering Blog: Avatar Technology". Archived from the original on 2010-04-11.
  5. "cabextract: Free Software for extracting Microsoft cabinet files" . Retrieved 17 March 2020.
  6. "Converting .LIT files for fun and profit". www.kyzer.me.uk.