Zstd

Last updated

Zstandard
Original author(s) Yann Collet
Developer(s) Yann Collet, Nick Terrell, Przemysław Skibiński [1]
Initial release23 January 2015 (2015-01-23)
Stable release
1.5.6 [2]   OOjs UI icon edit-ltr-progressive.svg / 27 March 2024;8 months ago (27 March 2024)
Repository
Written in C
Operating system Cross-platform
Platform Portable
Type Data compression
License BSD-3-Clause or GPL-2.0-or-later (dual-licensed)
Website facebook.github.io/zstd/   OOjs UI icon edit-ltr-progressive.svg

Zstandard is a lossless data compression algorithm developed by Yann Collet at Facebook. Zstd is the corresponding reference implementation in C, released as open-source software on 31 August 2016. [3] [4]

Contents

Features

Zstandard was designed to give a compression ratio comparable to that of the DEFLATE algorithm (developed in 1991 and used in the original ZIP and gzip programs), but faster, especially for decompression. It is tunable with compression levels ranging from negative 7 (fastest) [5] to 22 (slowest in compression speed, but best compression ratio).

Starting from version 1.3.2 (October 2017), zstd optionally implements very long range search and deduplication (--long, 128 MiB window) similar to rzip or lrzip. [6]

Compression speed can vary by a factor of 20 or more between the fastest and slowest levels, while decompression is uniformly fast, varying by less than 20% between the fastest and slowest levels. [7] The Zstandard command-line has an "adaptive" (--adapt) mode that varies compression level depending on I/O conditions, mainly how fast it can write the output.

Zstd at its maximum compression level gives a compression ratio close to lzma, lzham, and ppmx, and performs better[ vague ] than lza or bzip2.[ improper synthesis? ] [8] [9] Zstandard reaches the current Pareto frontier, as it decompresses faster than any other currently available algorithm with similar or better compression ratio. [10] [11]

Dictionaries can have a large impact on the compression ratio of small files, so Zstandard can use a user-provided compression dictionary. It also offers a training mode, able to generate a dictionary from a set of samples. [12] [13] In particular, one dictionary can be loaded to process large sets of files with redundancy between files, but not necessarily within each file, e.g., log files.

Design

Zstandard combines a dictionary-matching stage (LZ77) with a large search window and a fast entropy-coding stage. It uses both Huffman coding (used for entries in the Literals section) [14] and finite-state entropy (FSE) – a fast tabled version of ANS, tANS, used for entries in the Sequences section. Because of the way that FSE carries over state between symbols, decompression involves processing symbols within the Sequences section of each block in reverse order (from last to first).

Usage

Zstandard
Filename extension
.zst [15]
Internet media type
application/zstd [15]
Magic number 28 b5 2f fd [15]
Type of format Data compression
Standard RFC   8878
Website github.com/facebook/zstd/blob/dev/doc/zstd_compression_format.md
Zstandard Dictionary
Internet media type application/zstd
Magic number 37 a4 30 ec [15]
Standard RFC   8878
Website github.com/facebook/zstd/blob/dev/doc/zstd_compression_format.md#dictionary-format

The Linux kernel has included Zstandard since November 2017 (version 4.14) as a compression method for the btrfs and squashfs filesystems. [16] [17] [18]

In 2017, Allan Jude integrated Zstandard into the FreeBSD kernel, [19] and it was subsequently integrated as a compressor option for core dumps (both user programs and kernel panics). It was also used to create a proof-of-concept OpenZFS compression method [7] which was integrated in 2020. [20]

The AWS Redshift and RocksDB databases include support for field compression using Zstandard. [21]

In March 2018, Canonical tested [22] the use of zstd as a deb package compression method by default for the Ubuntu Linux distribution. Compared with xz compression of deb packages, zstd at level 19 decompresses significantly faster, but at the cost of 6% larger package files. Support was added to Debian (and subsequently, Ubuntu) in April 2018 (in version 1.6~rc1). [23] [22] [24]

In 2018 the algorithm was published as RFC   8478, which also defines an associated media type "application/zstd", filename extension "zst", and HTTP content encoding "zstd". [25]

Arch Linux added support for zstd as a package compression method in October 2019 with the release of the pacman  5.2 package manager [26] and in January 2020 switched from xz to zstd for the packages in the official repository. Arch uses zstd -c -T0 --ultra -20 -, the size of all compressed packages combined increased by 0.8% (compared to xz), the decompression speed is 14 times faster, decompression memory increased by 50 MiB when using multiple threads, compression memory increases but scales with the number of threads used. [27] [28] [29] Arch Linux later also switched to zstd as default compression algorithm for mkinitcpio initial ramdisk generator. [30]

Fedora added ZStandard support to RPM in May 2018 (Fedora release 28) and used it for packaging the release in October 2019 (Fedora 31). [31] In Fedora 33, the filesystem is compressed by default with zstd. [32] [33]

Full implementation of the algorithm with an option to choose the compression level is used in the .NSZ/.XCZ [34] file formats developed by the homebrew community for the Nintendo Switch hybrid game console. [35] Similarly, it is also one of many supported compression algorithms in the .RVZ Wii and GameCube disc image file format.

On 15 June 2020, Zstandard was implemented in version 6.3.8 of the zip file format with codec number 93, deprecating the previous codec number of 20 as it was implemented in version 6.3.7, released on 1 June. [36] [37]

In March 2024, Google Chrome version 123 (and Chromium-based browsers such as Brave or Microsoft Edge) added zstd support in the HTTP header Content-Encoding. [38] In May 2024, Firefox release 126.0 added zstd support in the HTTP header Content-Encoding. [39]

License

The reference implementation is licensed under the BSD license, published at GitHub. [40] Since version 1.0, it had an additional Grant of Patent Rights. [41]

From version 1.3.1, [42] this patent grant was dropped and the license was changed to a BSD + GPLv2 dual license. [43]

See also

Related Research Articles

zlib DEFLATE codec library

zlib is a software library used for data compression as well as a data format. zlib was written by Jean-loup Gailly and Mark Adler and is an abstraction of the DEFLATE compression algorithm used in their gzip file compression program. zlib is also a crucial component of many software platforms, including Linux, macOS, and iOS. It has also been used in gaming consoles such as the PlayStation 4, PlayStation 3, Wii U, Wii, Xbox One and Xbox 360.

In computing, Deflate is a lossless data compression file format that uses a combination of LZ77 and Huffman coding. It was designed by Phil Katz, for version 2 of his PKZIP archiving tool. Deflate was later specified in RFC 1951 (1996).

ZIP is an archive file format that supports lossless data compression. A ZIP file may contain one or more files or directories that may have been compressed. The ZIP file format permits a number of compression algorithms, though DEFLATE is the most common. This format was originally created in 1989 and was first implemented in PKWARE, Inc.'s PKZIP utility, as a replacement for the previous ARC compression format by Thom Henderson. The ZIP format was then quickly supported by many software utilities other than PKZIP. Microsoft has included built-in ZIP support in versions of Microsoft Windows since 1998 via the "Plus! 98" addon for Windows 98. Native support was added as of the year 2000 in Windows ME. Apple has included built-in ZIP support in Mac OS X 10.3 and later. Most free operating systems have built in support for ZIP in similar manners to Windows and macOS.

Info-ZIP is a set of open-source software to handle ZIP archives. It has been in circulation since 1989. It consists of 4 separately-installable packages: the Zip and UnZip command-line utilities; and WiZ and MacZip, which are graphical user interfaces for archiving programs in Microsoft Windows and classic Mac OS, respectively.

RAR is a proprietary archive file format that supports data compression, error correction and file spanning. It was developed in 1993 by Russian software engineer Eugene Roshal and the software is licensed by win.rar GmbH. The name RAR stands for Roshal Archive.

<span class="mw-page-title-main">7-Zip</span> Open-source file archiver

7-Zip is a free and open-source file archiver, a utility used to place groups of files within compressed containers known as "archives". It is developed by Igor Pavlov and was first released in 1999. 7-Zip has its own archive format called 7z, but can read and write several others.

Lempel–Ziv–Oberhumer (LZO) is a lossless data compression algorithm that is focused on decompression speed.

Squashfs is a compressed read-only file system for Linux. Squashfs compresses files, inodes and directories, and supports block sizes from 4 KiB up to 1 MiB for greater compression. Several compression algorithms are supported. Squashfs is also the name of free software, licensed under the GPL, for accessing Squashfs filesystems.

The following tables compare general and technical information for a number of file systems.

<span class="mw-page-title-main">PeaZip</span> File archive computer program

PeaZip is a free and open-source file manager and file archiver for Microsoft Windows, ReactOS, Linux, MacOS and BSD by Giorgio Tani. It supports its native PEA archive format and other mainstream formats, with special focus on handling open formats. Version 9.4.0 supported 234 file extensions.

Btrfs is a computer storage format that combines a file system based on the copy-on-write (COW) principle with a logical volume manager, developed together. It was created by Chris Mason in 2007 for use in Linux, and since November 2013, the file system's on-disk format has been declared stable in the Linux kernel.

<span class="mw-page-title-main">Xarchiver</span>

Xarchiver is a front-end to various command line archiving tools for Linux and BSD operating systems, designed to be independent of the desktop environment. It is the default archiving application of Xfce and LXDE. Deepin's archive manager is based on Xarchiver.

XZ Utils is a set of free software command-line lossless data compressors, including the programs lzma and xz, for Unix-like operating systems and, from version 5.0 onwards, Microsoft Windows. For compression/decompression the Lempel–Ziv–Markov chain algorithm (LZMA) is used. XZ Utils started as a Unix port of Igor Pavlov's LZMA-SDK that has been adapted to fit seamlessly into Unix environments and their usual structure and behavior.

FreeArc is a free and open-source high-performance file archiver developed by Bulat Ziganshin. The project is presumably discontinued, since no information has been released by the developers since 2016 and the official website is down.

zram, formerly called compcache, is a Linux kernel module for creating a compressed block device in RAM, i.e. a RAM disk with on-the-fly disk compression. The block device created with zram can then be used for swap or as general-purpose RAM disk. The two most common uses for zram are for the storage of temporary files and as a swap device. Initially, zram had only the latter function, hence the original name "compcache". Unlike swap, zram only uses 0.1% of the maximum size of the disk when not in use.

LZ4 is a lossless data compression algorithm that is focused on compression and decompression speed. It belongs to the LZ77 family of byte-oriented compression schemes.

<span class="mw-page-title-main">OpenZFS</span> Open-source implementation of the ZFS file system

OpenZFS is an open-source implementation of the ZFS file system and volume manager initially developed by Sun Microsystems for the Solaris operating system, and is now maintained by the OpenZFS Project. Similar to the original ZFS, the implementation supports features like data compression, data deduplication, copy-on-write clones, snapshots, RAID-Z, and virtual devices that can create filesystems that span multiple disks.

LZFSE is an open source lossless data compression algorithm created by Apple Inc. It was released with a simpler algorithm called LZVN.

Bcachefs is a copy-on-write (COW) file system for Linux-based operating systems. Its primary developer, Kent Overstreet, first announced it in 2015, and it was added to the Linux kernel beginning with 6.7. It is intended to compete with the modern features of ZFS or Btrfs, and the speed and performance of ext4 or XFS.

References

  1. "Contributors to facebook/zstd". github.com. Archived from the original on 27 January 2021. Retrieved 26 January 2021.
  2. "Release Zstandard v1.5.6 - Chrome Edition · facebook/zstd" . Retrieved 27 March 2024.
  3. Sergio De Simone (2 September 2016). "Facebook Open-Sources New Compression Algorithm Outperforming Zlib". InfoQ. Retrieved 20 April 2019.
  4. "Life imitates satire: Facebook touts zlib killer just like Silicon Valley's Pied Piper". The Register. 31 August 2016. Retrieved 6 September 2016.
  5. "Release Zstandard v1.3.4 - faster everything · facebook/zstd". GitHub. Retrieved 27 March 2024.
  6. "Command Line Interface for Zstandard library". GitHub. 28 October 2021.
  7. 1 2 "ZStandard in ZFS" (PDF). open-zfs.org. 2017. Retrieved 20 April 2019.
  8. Matt Mahoney. "Silesia Open Source Compression Benchmark" . Retrieved 10 May 2019.
  9. Matt Mahoney (29 August 2016). "Large Text Compression Benchmark, .2157 zstd" . Retrieved 1 September 2016.
  10. TurboBench: Static/Dynamic web content compression benchmark, PowTurbo
  11. Matt Mahoney, Silesia Open Source Compression Benchmark
  12. "Facebook developers report massive speedups and compression ratio improvements when using dictionaries" (PDF). Fermilab . 11 October 2017. Retrieved 27 March 2024.
  13. "Smaller and faster data compression with Zstandard". Facebook. 31 August 2016.
  14. "facebook/zstd". GitHub. 28 October 2021.
  15. 1 2 3 4 Collet, Yann (February 2021). Kucherawy, Murray S. (ed.). Zstandard Compression and the application/zstd Media Type. Internet Engineering Task Force Request for Comments. doi: 10.17487/RFC8878 . RFC 8878 . Retrieved 26 February 2023.
  16. Corbet, Jonathan (17 September 2017). "The rest of the 4.14 merge window [LWN.net]". lwn.net . Retrieved 27 March 2024.
  17. "Linux_4.14 - Linux Kernel Newbies". Kernelnewbies.org. 30 December 2017. Retrieved 16 August 2018.
  18. Larabel, Michael (8 September 2017). "Zstd Compression For Btrfs & Squashfs Set For Linux 4.14, Already Used Within Facebook - Phoronix". www.phoronix.com.
  19. "Integrate ZSTD into the kernel · freebsd/Freebsd-SRC@28ef165". GitHub .
  20. "Add ZSTD support to ZFS · openzfs/ZFS@10b3c7f". GitHub .
  21. "Zstandard Encoding - Amazon Redshift". 20 April 2019.
  22. 1 2 Larabel, Michael (12 March 2018). "Canonical Working On Zstd-Compressed Debian Packages For Ubuntu". phoronix.com. Phoronix Media. Retrieved 29 October 2019. The developers at Canonical are considering a feature freeze exception to get this newly-developed Zstd Apt/Dpkg support in Ubuntu 18.04 LTS. In doing so, they mention they would be looking at enabling Zstd compression for packages by default in Ubuntu 18.10.
  23. "New Ubuntu Installs Could Be Speed Up by 10% with the Zstd Compression Algorithm". Softpedia. 12 March 2018. Retrieved 13 August 2018.
  24. "Debian Changelog for apt". Debian. 19 April 2021. Retrieved 7 November 2022.
  25. Collet, Yann (October 2018). Kucherawy, Murray S. (ed.). Zstandard Compression and the application/zstd Media Type. Internet Engineering Task Force Request for Comments. doi: 10.17487/RFC8478 . RFC 8478 . Retrieved 7 October 2020.
  26. Larabel, Michael (16 October 2019). "Arch Linux Nears Roll-Out of ZSTD Compressed Packages for Faster Pacman Installs". Phoronix.
  27. Broda, Mara (4 January 2020). "Now using Zstandard instead of xz for package compression". Arch Linux. Retrieved 5 January 2020.
  28. Broda, Mara (25 March 2019). "RFC: (devtools) Changing default compression method to zstd". arch-dev-public (Mailing list).
  29. Broda, Mara; Polyak, Levente (27 December 2019). "makepkg.conf: change default compression method to zstd". GitHub .
  30. Razzolini, Giancarlo (19 February 2021). "News: Moving to Zstandard images by default on mkinitcpio". Arch Linux. Retrieved 28 December 2021.
  31. "Changes/Switch RPMS to ZSTD compression". Fedora Project Wiki.
  32. "Fedora Workstation 34 feature focus: Btrfs transparent compression". Fedora Magazine. 14 April 2021. Retrieved 12 May 2022.
  33. "Changes/BtrfsTransparentCompression". Fedora Project Wiki. Retrieved 12 May 2022.
  34. "RELEASE - nsZip - NSP compressor/decompressor to reduce storage". GBAtemp.net - The Independent Video Game Community. 20 October 2019. Retrieved 3 November 2019.
  35. Bosshard, Nico (31 October 2019), nsZip is a tool to compress/decompress Nintendo Switch games using the here specified NSZ file format: nicoboss/nsZip , retrieved 3 November 2019
  36. APPNOTE.TXT - .ZIP File Format Specification Version: 6.3.8, 15 June 2020, retrieved 7 July 2020
  37. APPNOTE.TXT - .ZIP File Format Specification Version: 6.3.7, 1 June 2020, retrieved 6 June 2020
  38. "New in Chrome 123 | Chrome Blog". Chrome for Developers. 19 March 2024. Retrieved 16 April 2024.
  39. "Firefox 126.0, See All New Features, Updates and Fixes" . Retrieved 15 May 2024.
  40. "Facebook open sources Zstandard data compression algorithm, aims to replace technology behind Zip". ZDnet. 31 August 2016. Retrieved 1 September 2016.
  41. "zstd/PATENTS at v1.3.0 · facebook/zstd". GitHub. Retrieved 27 March 2024.
  42. "Release Zstandard v1.3.1 · facebook/zstd". GitHub. Retrieved 27 March 2024.
  43. "New license by Cyan4973 · Pull Request #801 · facebook/zstd". GitHub. Retrieved 27 March 2024.