LZFSE

Last updated • 2 min readFrom Wikipedia, The Free Encyclopedia
LZFSE
Developer(s) Apple
Initial release2015
Stable release
lzfse-1.0 / 8 May 2017;7 years ago (2017-05-08)
Repository lzfse on GitHub
Written in C
Operating system macOS, iOS, Linux [1]
Available inC
Type Data compression
License 3-clause New BSD License

LZFSE (Lempel–Ziv Finite State Entropy) is an open source lossless data compression algorithm created by Apple Inc. It was released with a simpler algorithm called LZVN. [2]

Contents

Overview

The name is an acronym for Lempel–Ziv and finite-state entropy [3] (implementation of asymmetric numeral systems). LZFSE was introduced by Apple at its Worldwide Developer Conference 2015. It shipped with that year's iOS 9 and OS X 10.11 releases.

Apple claims that LZFSE compresses with a ratio comparable to that of zlib (DEFLATE) and decompresses two to three times faster while using fewer resources, therefore offering higher energy efficiency than zlib. It was aimed for scenarios where decompression speed and rate should be prioritised equally. [3] Part of this energy efficiency was achieved by optimising the algorithm for modern micro-architectures, specifically focusing on arm64. [4] Third-party benchmarking confirms that LZFSE decompresses faster than zlib, but also suggests that many other modern compression algorithms may have more favorable compression algorithm performance characteristics such as density, compression speed and decompression speed by a significant margin. [5]

According to the Squash Benchmark, LZFSE is similar in speed to zstd (level 6), but has a slightly worse ratio. LZVN is similar in speed to LZ4 level 4, with a slightly worse ratio as well. [6] Neither LZFSE nor LZVN is tunable at runtime, although a few constants can be tweaked at compile time for the usual speed-ratio trade-off. [7]

Implementation

A reference C library written by Eric Bainville was made available under the 3-clause BSD License after WWDC 2016. It includes an executable to compress and decompress LZFSE streams as well. There are no plans to expose an LZVN API. [1]

Apple's LZFSE implementation uses a simpler algorithm called LZVN when the input is smaller than LZFSE_ENCODE_LZVN_THRESHOLD (4096 bytes). This is a LZSS-type algorithm without entropy encoding but with three widths of REP (L,M,D) packets. In the open source reference implementation, Apple explains that LZFSE does not perform as well for small sizes, so LZVN is used instead. [7] This algorithm in libfastCompression.a was discovered earlier as the default kernelcache compression method in Mac OS X Yosemite Developer Preview 1 (2014), replacing the legacy lzss compression from Haruhiko Okumura. [8]

Usage

AppleFSCompression.framework (AFSC), the mechanism for quasi-transparent compression in HFS Plus and Apple File System, supports LZFSE and LZVN since OS X 10.9.

Apple's Disk Images framework has offered an LZFSE-based encoding called ULFO since Mac OS X 10.11, [9] accessible via hdiutil(1) [10] and some third-party image utilities.

Apple introduced the Apple Archive format and its associated API in macOS High Sierra in 2017. [11] The extension name is .aar (since macOS Big Sur, used to be .yaa). Encryption was introduced in macOS Monterey, when AA became the default Archive Utility format. Three command-line utilities are available in macOS to handle AA files. [12] [13] Of third-party programs, Keka is able to use the system APIs to handle AA files, but no independent implementations exist on other systems. [14]

See also

Related Research Articles

gzip GNU file compression/decompression tool

gzip is a file format and a software application used for file compression and decompression. The program was created by Jean-loup Gailly and Mark Adler as a free software replacement for the compress program used in early Unix systems, and intended for use by GNU. Version 0.1 was first publicly released on 31 October 1992, and version 1.0 followed in February 1993.

Lossless compression is a class of data compression that allows the original data to be perfectly reconstructed from the compressed data with no loss of information. Lossless compression is possible because most real-world data exhibits statistical redundancy. By contrast, lossy compression permits reconstruction only of an approximation of the original data, though usually with greatly improved compression rates.

zlib DEFLATE codec library

zlib is a software library used for data compression as well as a data format. zlib was written by Jean-loup Gailly and Mark Adler and is an abstraction of the DEFLATE compression algorithm used in their gzip file compression program. zlib is also a crucial component of many software platforms, including Linux, macOS, and iOS. It has also been used in gaming consoles such as the PlayStation 4, PlayStation 3, Wii U, Wii, Xbox One and Xbox 360.

LZ77 and LZ78 are the two lossless data compression algorithms published in papers by Abraham Lempel and Jacob Ziv in 1977 and 1978. They are also known as LZ1 and LZ2 respectively. These two algorithms form the basis for many variations including LZW, LZSS, LZMA and others. Besides their academic influence, these algorithms formed the basis of several ubiquitous compression schemes, including GIF and the DEFLATE algorithm used in PNG and ZIP.

In computing, Deflate is a lossless data compression file format that uses a combination of LZ77 and Huffman coding. It was designed by Phil Katz, for version 2 of his PKZIP archiving tool. Deflate was later specified in RFC 1951 (1996).

<span class="mw-page-title-main">7-Zip</span> Open-source file archiver

7-Zip is a free and open-source file archiver, a utility used to place groups of files within compressed containers known as "archives". It is developed by Igor Pavlov and was first released in 1999. 7-Zip has its own archive format called 7z, but can read and write several others.

rzip is a huge-scale data compression computer program designed around initial LZ77-style string matching on a 900 MB dictionary window, followed by bzip2-based Burrows–Wheeler transform and entropy coding (Huffman) on 900 kB output chunks.

Lempel–Ziv–Oberhumer (LZO) is a lossless data compression algorithm that is focused on decompression speed.

Snappy is a fast data compression and decompression library written in C++ by Google based on ideas from LZ77 and open-sourced in 2011. It does not aim for maximum compression, or compatibility with any other compression library; instead, it aims for very high speeds and reasonable compression. Compression speed is 250 MB/s and decompression speed is 500 MB/s using a single core of a circa 2011 "Westmere" 2.26 GHz Core i7 processor running in 64-bit mode. The compression ratio is 20–100% lower than gzip.

Lempel–Ziv–Storer–Szymanski (LZSS) is a lossless data compression algorithm, a derivative of LZ77, that was created in 1982 by James A. Storer and Thomas Szymanski. LZSS was described in article "Data compression via textual substitution" published in Journal of the ACM.

Grammar-based codes or Grammar-based compression are compression algorithms based on the idea of constructing a context-free grammar (CFG) for the string to be compressed. Examples include universal lossless data compression algorithms. To compress a data sequence , a grammar-based code transforms into a context-free grammar . The problem of finding a smallest grammar for an input sequence is known to be NP-hard, so many grammar-transform algorithms are proposed from theoretical and practical viewpoints. Generally, the produced grammar is further compressed by statistical encoders like arithmetic coding.

A timeline of events related to  information theory,  quantum information theory and statistical physics,  data compression,  error correcting codes and related subjects.

<span class="mw-page-title-main">HTTP compression</span> Capability that can be built into web servers and web clients

HTTP compression is a capability that can be built into web servers and web clients to improve transfer speed and bandwidth utilization.

mod_deflate is an optional module for the Apache HTTP Server, Apache v2.0 and later. It is based on Deflate lossless data compression algorithm that uses a combination of the LZ77 algorithm and Huffman coding. This module provides the DEFLATE output filter that allows output from Apache HTTP server to be compressed before being sent to the client over the network. It also provides a filter for decompressing a gzip compressed response body.

<span class="mw-page-title-main">Zopfli</span> Data compression software

Zopfli is a data compression library that performs Deflate, gzip and zlib data encoding. It achieves higher compression ratios than mainstream Deflate and zlib implementations at the cost of being slower. Google first released Zopfli in February 2013 under the terms of Apache License 2.0.

LZ4 is a lossless data compression algorithm that is focused on compression and decompression speed. It belongs to the LZ77 family of byte-oriented compression schemes.

Brotli is a lossless data compression algorithm developed by Google. It uses a combination of the general-purpose LZ77 lossless compression algorithm, Huffman coding and 2nd-order context modelling. Brotli is primarily used by web servers and content delivery networks to compress HTTP content, making internet websites load faster. A successor to gzip, it is supported by all major web browsers and has become increasingly popular, as it provides better compression than gzip.

Asymmetric numeral systems (ANS) is a family of entropy encoding methods introduced by Jarosław (Jarek) Duda from Jagiellonian University, used in data compression since 2014 due to improved performance compared to previous methods. ANS combines the compression ratio of arithmetic coding, with a processing cost similar to that of Huffman coding. In the tabled ANS (tANS) variant, this is achieved by constructing a finite-state machine to operate on a large alphabet without using multiplication.

Zstandard is a lossless data compression algorithm developed by Yann Collet at Facebook. Zstd is the corresponding reference implementation in C, released as open-source software on 31 August 2016.

842, 8-4-2, or EFT is a data compression algorithm. It is a variation on Lempel–Ziv compression with a limited dictionary length. With typical data, 842 gives 80 to 90 percent of the compression of LZ77 with much faster throughput and less memory use. Hardware implementations also provide minimal use of energy and minimal chip area.

References

  1. 1 2 Bainville, Eric (2016-06-07). "LZFSE compression library and command line tool". GitHub. Retrieved 2016-07-04.
  2. Apple Inc. "Data Compression - Compression | Apple Developer Documentation". developer.apple.com. Retrieved 2017-04-07.
  3. 1 2 De Simone, Sergio. "Apple Open-Sources its New Compression Algorithm LZFSE". infoq. Retrieved 2016-07-07.
  4. Apple Inc. (2015-06-12). "Low Energy, High Performance: Compression and Accelerate – WWDC 2015 – Apple Developer Videos". developer.apple.com. Retrieved 2017-03-05. pdf Archived 2016-04-18 at the Wayback Machine
  5. "Compression Benchmark" . Retrieved 2018-08-10.
  6. "Squash Compression Benchmark". GitHub. Squash. Retrieved 25 December 2019.
  7. 1 2 "lzfse_tunables.h". GitHub. 18 December 2019. Retrieved 22 December 2019.
  8. Piker-Alpha (4 June 2014). "OS X 10.10 Yosemite DP1 kernel(cache)". Pike's Universum. Retrieved 22 December 2019.
  9. Tsai, Michael (2015-10-07). "LZFSE Disk Images in El Capitan". Archived from the original on 2017-04-09. Retrieved 2022-04-15.
  10. "hdiutil(1) mojave man page" . Retrieved 2022-04-15.
  11. "Apple Archive". Apple Developer Documentation.
  12. "Inside Apple Archive: more than a compression format". The Eclectic Light Company. 10 May 2022.
  13. "AA(1): Manipulate Apple Archives". keith.github.io.
  14. "AppleArchive support · Issue #829 · aonez/Keka". GitHub.
  15. "compression_algorithm". Apple Developer Documentation. Apple Inc. Retrieved 2019-08-11.