LZX (algorithm)

Last updated

LZX is an LZ77 family compression algorithm. It is also the name of a file archiver with the same name. Both were invented by Jonathan Forbes and Tomi Poutanen in 1990s.

In signal processing, data compression, source coding, or bit-rate reduction involves encoding information using fewer bits than the original representation. Compression can be either lossy or lossless. Lossless compression reduces bits by identifying and eliminating statistical redundancy. No information is lost in lossless compression. Lossy compression reduces bits by removing unnecessary or less important information.

Algorithm An unambiguous specification of how to solve a class of problems

In mathematics and computer science, an algorithm is an unambiguous specification of how to solve a class of problems. Algorithms can perform calculation, data processing, automated reasoning, and other tasks.

Contents

Instances of use of the LZX algorithm

Amiga LZX

LZX was publicly released as an Amiga file archiver in 1995, while the authors were studying at the University of Waterloo in Canada. The software was shareware, which was common for compression software at the time. The registered version contained fixes and improvements not available in the evaluation version. In 1997, the authors gave away a free keyfile, which allowed anyone to use the registered version, as they had stopped work on the archiver and stopped accepting registrations.

Amiga family of personal computers sold by Commodore

The Amiga is a family of personal computers introduced by Commodore in 1985. The original model was part of a wave of 16- and 32-bit computers that featured 256 KB or more of RAM, mouse-based GUIs, and significantly improved graphics and audio over 8-bit systems. This wave included the Atari ST—released the same year—Apple's Macintosh, and later the Apple IIGS. Based on the Motorola 68000 microprocessor, the Amiga differed from its contemporaries through the inclusion of custom hardware to accelerate graphics and sound, including sprites and a blitter, and a pre-emptive multitasking operating system called AmigaOS.

University of Waterloo public research university in Waterloo, Ontario, Canada

The University of Waterloo is a public research university with a main campus in Waterloo, Ontario, Canada. The main campus is on 404 hectares of land adjacent to "Uptown" Waterloo and Waterloo Park. The university offers academic programs administered by six faculties and ten faculty-based schools. The university also operates three satellite campuses and four affiliated university colleges. Waterloo is a member of the U15, a group of research-intensive universities in Canada. The University of Waterloo is most famous for its cooperative education (co-op) programs, which allow the students to integrate their education with applicable work experiences. The university operates the largest post-secondary co-operative education program in the world, with over 20, 000 undergraduate students in over 140 co-operative education programs.

Canada Country in North America

Canada is a country in the northern part of North America. Its ten provinces and three territories extend from the Atlantic to the Pacific and northward into the Arctic Ocean, covering 9.98 million square kilometres, making it the world's second-largest country by total area. Canada's southern border with the United States is the world's longest bi-national land border. Its capital is Ottawa, and its three largest metropolitan areas are Toronto, Montreal, and Vancouver. As a whole, Canada is sparsely populated, the majority of its land area being dominated by forest and tundra. Consequently, its population is highly urbanized, with over 80 percent of its inhabitants concentrated in large and medium-sized cities, many near the southern border. Canada's climate varies widely across its vast area, ranging from arctic weather in the north, to hot summers in the southern regions, with four distinct seasons.

Microsoft Cabinet files

In 1996, Forbes went to work for Microsoft, [1] and Microsoft's cabinet archiver was enhanced to include the LZX compression method. Improvements included a variable search window size; Amiga LZX was fixed to 64 KB, Microsoft LZX could range on powers of two between 32 and 2048 kilobytes (32,768 to 2,097,152 bytes). A special preprocessor was added to detect Intel 80x86 "CALL" instructions, converting their operands from relative addressing to absolute addressing, thus calls to the same location resulted in repeated strings that the compressor could match, improving compression of 80x86 binary code.

Microsoft U.S.-headquartered technology company

Microsoft Corporation (MS) is an American multinational technology company with headquarters in Redmond, Washington. It develops, manufactures, licenses, supports and sells computer software, consumer electronics, personal computers, and related services. Its best known software products are the Microsoft Windows line of operating systems, the Microsoft Office suite, and the Internet Explorer and Edge web browsers. Its flagship hardware products are the Xbox video game consoles and the Microsoft Surface lineup of touchscreen personal computers. As of 2016, it is the world's largest software maker by revenue, and one of the world's most valuable companies. The word "Microsoft" is a portmanteau of "microcomputer" and "software". Microsoft is ranked No. 30 in the 2018 Fortune 500 rankings of the largest United States corporations by total revenue.

Cabinet is an archive-file format for Microsoft Windows that supports lossless data compression and embedded digital certificates used for maintaining archive integrity. Cabinet files have .cab filename extensions and are recognized by their first 4 bytes MSCF. Cabinet files were known originally as Diamond files.

The kilobyte is a multiple of the unit byte for digital information.

Microsoft Compressed HTML Help (CHM) files

When Microsoft introduced Microsoft Compressed HTML Help, the replacement to their classic Help file format, they chose to compress all of the HTML data with the LZX algorithm. However, in order to improve random access speed, the compressor was altered to reset itself after every 64 kilobyte (65,536 bytes) interval and re-align to a 16-bit boundary after every 32 kilobyte interval. Thus, the HTMLHelp software could immediately seek to the nearest 64 kilobyte interval and start decoding from there, rather than decoding from the beginning of the compressed datastream at all times.

Microsoft Reader (LIT) files

Microsoft LIT files for Microsoft Reader are simply an extension of the CHM file format, and thus also use LZX compression.

Microsoft Reader is a Microsoft application for reading e-books, first released in August 2000, or a Microsoft application for reading PDF and XPS files, first released for Windows 8.1.

Windows Imaging Format (WIM) files

Windows Imaging Format, the installation/drive image file format of Windows Vista and Windows 7, uses LZX as one of the compression methods. [2]

The Windows Imaging Format (WIM) is a file-based disk image format. It was developed by Microsoft to help deploy Windows Vista and subsequent versions of the Windows operating system family, as well as Windows Fundamentals for Legacy PCs.

Windows Vista personal computer operating system by Microsoft released in 2006

Windows Vista is an operating system that was produced by Microsoft for use on personal computers, including home and business desktops, laptops, tablet PCs and media center PCs. Development was completed on November 8, 2006, and over the following three months, it was released in stages to computer hardware and software manufacturers, business customers and retail channels. On January 30, 2007, it was released worldwide and was made available for purchase and download from the Windows Marketplace; it is the first release of Windows to be made available through a digital distribution platform. The release of Windows Vista came more than five years after the introduction of its predecessor, Windows XP, the longest time span between successive releases of Microsoft Windows desktop operating systems.

Windows 7 personal computer operating system by Microsoft released in 2009

Windows 7 is a personal computer operating system that was produced by Microsoft as part of the Windows NT family of operating systems. It was released to manufacturing on July 22, 2009 and became generally available on October 22, 2009, less than three years after the release of its predecessor, Windows Vista. Windows 7's server counterpart, Windows Server 2008 R2, was released at the same time.

Xbox Live Avatars

Microsoft uses LZX compression on Xbox Live Avatars to reduce their disk and bandwidth requirements. [3]

Decompressing LZX files

The unlzx program and XAD can unpack Amiga LZX archives. The cabextract program can unpack Microsoft cabinet files using the LZX method. There are a multitude of cross-platform tools for decompiling or viewing CHM files, as stated in the CHM article. LIT files can be unpacked using the Convert LIT software.

See also

Related Research Articles

A file archiver is a computer program that combines a number of files together into one archive file, or a series of archive files, for easier transportation or storage. File archivers may employ lossless data compression in their archive formats to reduce the size of the archive.

Windows Media Audio (WMA) is a series of audio codecs and their corresponding audio coding formats developed by Microsoft. It is a proprietary technology that forms part of the Windows Media framework. WMA consists of four distinct codecs. The original WMA codec, known simply as WMA, was conceived as a competitor to the popular MP3 and RealAudio codecs. WMA Pro, a newer and more advanced codec, supports multichannel and high resolution audio. A lossless codec, WMA Lossless, compresses audio data without loss of audio fidelity. WMA Voice, targeted at voice content, applies compression using a range of low bit rates. Microsoft has also developed a digital container format called Advanced Systems Format to store audio encoded by WMA.

zlib software library

zlib is a software library used for data compression. zlib was written by Jean-loup Gailly and Mark Adler and is an abstraction of the DEFLATE compression algorithm used in their gzip file compression program. zlib is also a crucial component of many software platforms including Linux, Mac OS X, and iOS. It has also been used in gaming consoles such as the PlayStation 4, PlayStation 3, Wii U, Wii, Xbox One and Xbox 360.

bzip2 compression software

bzip2 is a free and open-source file compression program that uses the Burrows–Wheeler algorithm. It only compresses single files and is not a file archiver. It is developed and maintained by Julian Seward. Seward made the first public release of bzip2, version 0.15, in July 1996. The compressor's stability and popularity grew over the next several years, and Seward released version 1.0 in late 2000.

In computing, Deflate is a lossless data compression algorithm and associated file format that uses a combination of the LZ77 algorithm and Huffman coding. It was originally defined by Phil Katz for version 2 of his PKZIP archiving tool. The file format was later specified in RFC 1951.

FLAC reference software for the handling of FLAC data

FLAC is an audio coding format for lossless compression of digital audio, and is also the name of the free software project producing the FLAC tools, the reference software package that includes a codec implementation. Digital audio compressed by FLAC's algorithm can typically be reduced to between 50 and 70 percent of its original size and decompress to an identical copy of the original audio data.

Monkey's Audio is an algorithm and file format for lossless audio data compression. Lossless data compression does not discard data during the process of encoding, unlike lossy compression methods such as AAC, MP3, Vorbis and Musepack.

ZIP is an archive file format that supports lossless data compression. A ZIP file may contain one or more files or directories that may have been compressed. The ZIP file format permits a number of compression algorithms, though DEFLATE is the most common. This format was originally created in 1989 and released to the public domain on February 14, 1989 by Phil Katz, and was first implemented in PKWARE, Inc.'s PKZIP utility, as a replacement for the previous ARC compression format by Thom Henderson. The ZIP format was then quickly supported by many software utilities other than PKZIP. Microsoft has included built-in ZIP support in versions of Microsoft Windows since 1998. Apple has included built-in ZIP support in Mac OS X 10.3 and later. Most free operating systems have built in support for ZIP in similar manners to Windows and Mac OS X.

RAR is a proprietary archive file format that supports data compression, error recovery and file spanning. It was developed by a Russian software engineer, Eugene Roshal and the RAR software is licensed by win.rar GmbH.

7-Zip open source file archiver

7-Zip is a free and open-source file archiver, a utility used to place groups of files within compressed containers known as "archives". It is developed by Igor Pavlov and was first released in 1999. 7-Zip uses its own 7z archive format, but can read and write several other archive formats. The program can be used from a command-line interface as the command p7zip, or through a graphical user interface that also features shell integration. Most of the 7-Zip source code is under the GNU LGPL license; the unRAR code, however, is under the GNU LGPL with an "unRAR restriction", which states that developers are not permitted to use the code to reverse-engineer the RAR compression algorithm.

The Lempel–Ziv–Markov chain algorithm (LZMA) is an algorithm used to perform lossless data compression. It has been under development since either 1996 or 1998 by Igor Pavlov and was first used in the 7z format of the 7-Zip archiver. This algorithm uses a dictionary compression scheme somewhat similar to the LZ77 algorithm published by Abraham Lempel and Jacob Ziv in 1977 and features a high compression ratio and a variable compression-dictionary size, while still maintaining decompression speed similar to other commonly used compression algorithms.

7z is a compressed archive file format that supports several different data compression, encryption and pre-processing algorithms. The 7z format initially appeared as implemented by the 7-Zip archiver. The 7-Zip program is publicly available under the terms of the GNU Lesser General Public License. The LZMA SDK 4.62 was placed in the public domain in December 2008. The latest stable version of 7-Zip and LZMA SDK is version 19.00.

LHA is a freeware compression utility and associated file format. It was created in 1988 by Haruyasu Yoshizaki, and originally named LHarc. A complete rewrite of LHarc, tentatively named LHx, was eventually released as LH. It was then renamed to LHA to avoid conflicting with the then-new MS-DOS 5.0 LH command. According to early documentation, LHA is pronounced like La.

UPX free and open source executable packer software

UPX is a free and open source executable packer supporting a number of file formats from different operating systems.

Microsoft Compiled HTML Help is a Microsoft proprietary online help format, consisting of a collection of HTML pages, an index and other navigation tools. The files are compressed and deployed in a binary format with the extension .CHM, for Compiled HTML. The format is often used for software documentation.

Executable compression is any means of compressing an executable file and combining the compressed data with decompression code into a single executable. When this compressed executable is executed, the decompression code recreates the original code from the compressed code before executing it. In most cases this happens transparently so the compressed executable can be used in exactly the same way as the original. Executable compressors are often referred to as "runtime packers", "software packers", "software protectors".

The Quantum compression format is a little-known data compression method created by David Stafford of Cinematronics, LLC. It was licensed to Borland, Microsoft and Novell. Quantum is one of the possible compression methods in a Microsoft Windows CAB archive. Quantum uses an extended LZ77 compression algorithm. Quantum archive files are named with the filename extension .Q by convention.

WebP type of image file format

WebP is an image format employing both lossy and lossless compression. It is currently developed by Google, based on technology acquired with the purchase of On2 Technologies.

Brotli Open source free software compression library

Brotli is a data format specification for data streams compressed with a specific combination of the general-purpose LZ77 lossless compression algorithm, Huffman coding and 2nd order context modelling. Google employees Jyrki Alakuijala and Zoltan Szabadka initially developed Brotli to decrease the size of transmissions of WOFF2 web fonts, and in that context Brotli was a continuation of the development of zopfli, which is a zlib-compatible implementation of the standard gzip and deflate specifications. Brotli allows a denser packing than gzip and deflate because of several algorithmic and format level improvements: the use of context models for literals and copy distances, describing copy distances through past distances, use of move-to-front queue in entropy code selection, joint-entropy coding of literal and copy lengths, the use of graph algorithms in block splitting, and a larger backward reference window are example improvements. The Brotli specification was generalized in September 2015 for HTTP stream compression, and can now be used to encode any data sent by a web server to a web browser if both client and server support the format. This generalized iteration also improved the compression ratio by using a pre-defined dictionary of frequently used words and phrases.

References

  1. http://www.linkedin.com/pub/jonathan-forbes/3/70a/a4b
  2. "Archived copy". Archived from the original on 2006-08-19. Retrieved 2006-08-19.CS1 maint: Archived copy as title (link)
  3. http://www.xbox.com/en-US/live/engineeringblog/xbox-live-avatar-technology.htm