Brotli

Last updated
Original author(s) Jyrki Alakuijala, Zoltán Szabadka
Developer(s) Jyrki Alakuijala, Eugene Kliuchnikov, Robert Obryk, Zoltán Szabadka, Lode Vandevenne
Initial release15 October 2013;10 years ago (2013-10-15)
Stable release
1.1.0 [1]   OOjs UI icon edit-ltr-progressive.svg / 31 August 2023;6 months ago (31 August 2023)
Repository
Written in C
Operating system Cross-platform
Platform Portable
Type Data compression
License MIT License
Website brotli.org

Brotli is a lossless data compression algorithm developed by Google. It uses a combination of the general-purpose LZ77 lossless compression algorithm, Huffman coding and 2nd-order context modelling. Brotli is primarily used by web servers and content delivery networks to compress HTTP content, making internet websites load faster. A successor to gzip, it is supported by all major web browsers and has become increasingly popular, as it provides better compression than gzip.[ citation needed ]

Contents

History

Google employees Jyrki Alakuijala and Zoltán Szabadka initially developed Brotli in 2013 to decrease the size of transmissions of WOFF web font. [2] Alakuijala and Szabadka completed the Brotli specification during 20132016. The specification was accompanied with a reference implementation developed by two additional authors, Evgenii Kliuchnikov and Lode Vandevenne, who had previously developed Google's zopfli implementation of deflate and gzip compatible compression in 2013. [3] :1 Unlike zopfli, which was a reimplementation of an existing data format specification, Brotli was a new data format and allowed the authors to improve compression ratios even further. [4]

The Brotli specification was generalized in September 2015 for HTTP stream compression (content-encoding type "br"). This generalized iteration also improved the compression ratio by using a predefined dictionary of frequently used words and phrases. The version of Brotli released in September 2015 by the Google software engineers contained enhancements in generic lossless data compression, with particular emphasis on use for HTTP compression. The encoder was partly rewritten, with the result that the compression ratio improved, both the encoder and the decoder have been sped up, the streaming API was improved, and more compression quality levels have been added. Additionally, the new release shows performance improvements across platforms, with decoding memory reduction. [4]

The Internet Engineering Task Force approved the Brotli compressed data format specification as an informational request for comment ( RFC   7932) in July 2016. [5] The Brotli data format is an integral part of the 2nd iteration of the Web Open Font Format, [5] :3 which was recognized in a 2021 Technology & Engineering Emmy Award from the National Academy of Television Arts & Sciences for font technology standardization at W3C. [6] [7]

Brotli support has been added over the years to web browsers, with 96% of worldwide users using a browser that supports the format, as of July 2022. [8]

In 2016 Dropbox reimplemented Brotli in Rust to fulfill their requirement to be more secure against a malicious client. In 2018 they implemented the missing feature so one can append to a Brotli compressed file. [9] [10] [11]

Algorithm

Brotli's new file format allows its authors to improve upon Deflate by several algorithmic and format-level improvements: the use of context models for literals and copy distances, describing copy distances through past distances, use of move-to-front queue in entropy code selection, joint-entropy coding of literal and copy lengths, the use of graph algorithms in block splitting, and a larger backward reference window are example improvements.

Unlike most general-purpose compression algorithms, Brotli uses a predefined dictionary, roughly 120 KiB in size, in addition to the dynamically populated ("sliding window") dictionary. The predefined dictionary contains over 13000 common words, phrases and other substrings derived from a large corpus of text and HTML documents. [12] [3] Using a predefined dictionary has been shown to increase compression where a file mostly contains commonly used words. [13] However, according to Alakuijala, the predefined dictionary does not distract from Brotli's general-ness, and is not the main reason for improved compression. Brotli with an all-zero dictionary still performs well on web content due to the algorithmic advances, he claims. [14]

Brotli's sliding window is limited to 16 MiB. This enables decoding on mobile phones with limited resources, but makes Brotli underperform on compression benchmarks having larger files. The constraints of the small window size can be alleviated by using Large Window Brotli, which is not compatible with RFC7932 (Brotli proper). [15]

Name

While Google's zopfli implementation of the deflate compression algorithm is named after Zöpfli, the Swiss German word for a snack-sized braided buttery bread, brotli is named after Brötli, the Swiss German word for a bread roll. [4] Google's own implementation of the Brotli specification was released under the terms of the permissive free software MIT license in 2016. A formal validation of the Brotli specification was independently implemented by Mark Adler, [5] :126 one of the co-authors of the zlib/gzip compression format and library. Adler's implementation was released under the terms of the similarly permissive Apache License. [16] Other implementations of the specification also exist, including one in the source-to-source Haxe language.

Applications

Brotli compression is generally used as an alternative to gzip on the web, as Brotli provides better overall compression. [17] Compared to gzip compression, JavaScript files compressed with Brotli are roughly 15% smaller, HTML files are around 20% smaller, and CSS files are around 16% smaller. [18]

The reference implementation does ship a command-line program brotli similar to gzip, [19] but use in the Unix-like world as a simple compressor is scarce. Libarchive developers find the raw stream format of .br files difficult to support, as there is no magic number to indicate the file format. [20]

Industry support

Browsers and other clients

Web servers

Related Research Articles

gzip GNU file compression/decompression tool

gzip is a file format and a software application used for file compression and decompression. The program was created by Jean-loup Gailly and Mark Adler as a free software replacement for the compress program used in early Unix systems, and intended for use by GNU. Version 0.1 was first publicly released on 31 October 1992, and version 1.0 followed in February 1993.

Lossless compression is a class of data compression that allows the original data to be perfectly reconstructed from the compressed data with no loss of information. Lossless compression is possible because most real-world data exhibits statistical redundancy. By contrast, lossy compression permits reconstruction only of an approximation of the original data, though usually with greatly improved compression rates.

<span class="mw-page-title-main">PNG</span> Family of lossless compression file formats for image files

Portable Network Graphics is a raster-graphics file format that supports lossless data compression. PNG was developed as an improved, non-patented replacement for Graphics Interchange Format (GIF)—unofficially, the initials PNG stood for the recursive acronym "PNG's not GIF".

Run-length encoding (RLE) is a form of lossless data compression in which runs of data are stored as a single data value and count, rather than as the original run. This is most efficient on data that contains many such runs, for example, simple graphic images such as icons, line drawings, Conway's Game of Life, and animations. For files that do not have many runs, RLE could increase the file size.

zlib DEFLATE codec library

zlib is a software library used for data compression as well as a data format. zlib was written by Jean-loup Gailly and Mark Adler and is an abstraction of the DEFLATE compression algorithm used in their gzip file compression program. zlib is also a crucial component of many software platforms, including Linux, macOS, and iOS. It has also been used in gaming consoles such as the PlayStation 4, PlayStation 3, Wii U, Wii, Xbox One and Xbox 360.

bzip2 File compression software

bzip2 is a free and open-source file compression program that uses the Burrows–Wheeler algorithm. It only compresses single files and is not a file archiver. It relies on separate external utilities for tasks such as handling multiple files, encryption, and archive-splitting.

In computing, Deflate is a lossless data compression file format that uses a combination of LZ77 and Huffman coding. It was designed by Phil Katz, for version 2 of his PKZIP archiving tool. Deflate was later specified in RFC 1951 (1996).

ZIP is an archive file format that supports lossless data compression. A ZIP file may contain one or more files or directories that may have been compressed. The ZIP file format permits a number of compression algorithms, though DEFLATE is the most common. This format was originally created in 1989 and was first implemented in PKWARE, Inc.'s PKZIP utility, as a replacement for the previous ARC compression format by Thom Henderson. The ZIP format was then quickly supported by many software utilities other than PKZIP. Microsoft has included built-in ZIP support in versions of Microsoft Windows since 1998 via the "Plus! 98" addon for Windows 98. Native support was added as of the year 2000 in Windows ME. Apple has included built-in ZIP support in Mac OS X 10.3 and later. Most free operating systems have built in support for ZIP in similar manners to Windows and macOS.

compress is a Unix shell compression program based on the LZW compression algorithm. Compared to gzip's fastest setting, compress is slightly slower at compression, slightly faster at decompression, and has a significantly lower compression ratio. 1.8 MiB of memory is used to compress the Hutter Prize data, slightly more than gzip's slowest setting.

<span class="mw-page-title-main">7-Zip</span> Open-source file archiver

7-Zip is a free and open-source file archiver, a utility used to place groups of files within compressed containers known as "archives". It is developed by Igor Pavlov and was first released in 1999. 7-Zip has its own archive format called 7z, but can read and write several others.

7z is a compressed archive file format that supports several different data compression, encryption and pre-processing algorithms. The 7z format initially appeared as implemented by the 7-Zip archiver. The 7-Zip program is publicly available under the terms of the GNU Lesser General Public License. The LZMA SDK 4.62 was placed in the public domain in December 2008. The latest stable version of 7-Zip and LZMA SDK is version 23.01.

Lempel–Ziv–Oberhumer (LZO) is a lossless data compression algorithm that is focused on decompression speed.

The following tables compare general and technical information for a number of file archivers. Please see the individual products' articles for further information. They are neither all-inclusive nor are some entries necessarily up to date. Unless otherwise specified in the footnotes section, comparisons are based on the stable versions—without add-ons, extensions or external programs.

<span class="mw-page-title-main">HTTP compression</span> Capability that can be built into web servers and web clients

HTTP compression is a capability that can be built into web servers and web clients to improve transfer speed and bandwidth utilization.

mod_deflate is an optional module for the Apache HTTP Server, Apache v2.0 and later. It is based on Deflate lossless data compression algorithm that uses a combination of the LZ77 algorithm and Huffman coding. This module provides the DEFLATE output filter that allows output from Apache HTTP server to be compressed before being sent to the client over the network. It also provides a filter for decompressing a gzip compressed response body.

<span class="mw-page-title-main">WebP</span> Image file format

WebP is a raster graphics file format developed by Google intended as a replacement for JPEG, PNG, and GIF file formats. It supports both lossy and lossless compression, as well as animation and alpha transparency.

<span class="mw-page-title-main">Zopfli</span> Data compression software

Zopfli is a data compression library that performs Deflate, gzip and zlib data encoding. It achieves higher compression ratios than mainstream Deflate and zlib implementations at the cost of being slower. Google first released Zopfli in February 2013 under the terms of Apache License 2.0.

Zstandard is a lossless data compression algorithm developed by Yann Collet at Facebook. Zstd is the corresponding reference implementation in C, released as open-source software on 31 August 2016.

<span class="mw-page-title-main">Guetzli</span> JPEG encoder

Guetzli is a freely licensed JPEG encoder that Jyrki Alakuijala, Robert Obryk, and Zoltán Szabadka have developed in Google's Zürich research branch. The encoder seeks to produce significantly smaller files than prior encoders at equivalent quality, albeit at very low speed. It is named after the Swiss German diminutive expression for biscuits, in line with the names of other compression technology from Google.

JPEG XL is a royalty-free raster-graphics file format that supports both lossy and lossless compression. It is designed to outperform existing raster formats and thus become their universal replacement.

References

  1. "Release 1.1.0". 31 August 2023. Retrieved 18 September 2023.
  2. Sheeter, Rod (February 18, 2015), "Smaller Fonts with WOFF 2.0 and unicode-range", Google Open Source Blog, Mountain View, CA: opensource.googleblog.com.
  3. 1 2 Alakuijala, Jyrki; Kliuchnikov, Evgenii; Szabadka, Zoltan; Vandevenne, Lode (22 September 2015), "Comparison of Brotli, Deflate, Zopfli, LZMA, LZHAM and Bzip2 Compression Algorithms" (PDF), The Comprehensive R Archive Network, r-project.org.
  4. 1 2 3 Szabadka, Zoltan (September 22, 2015), "Introducing Brotli: a new compression algorithm for the internet", Google Open Source Blog, Mountain View, CA: opensource.googleblog.com.
  5. 1 2 3 Alakuijala, Jyrki; Szabadka, Zoltan (2016), RFC 7932: Brotli Compressed Data Format, Internet Engineering Task Force Request for Comments, Fremont, CA: IETF Trust.
  6. "W3C Receives Emmy Award for Standardizing Font Technology". 2022-06-01.
  7. "Changing the face of the web: W3C Web Fonts Working Group and MPEG recognized with a Technology & Engineering Emmy Award". 2022-06-01.
  8. "Can I use... - Brotli". 2022-06-28.
  9. Lossless compression with Brotli in Rust for a bit of Pied Piper on the backend, Daniel Reiter Horn and Mehant Baid, 2016-06-29.
  10. , Rishabh Jain and Daniel Reiter Horn, 2020-08-04
  11. append to brotli compressed file, github ticket to google Brotli, listing implementation ideas, 2017-12-06
  12. Chirgwin, Richard (September 23, 2015), "Google's new squeeze: Brotli compression open-sourced", The Register, theregister.co.uk.
  13. Larkin, Henry (2007). "Word Indexing for Mobile Device Data Representations". 7th IEEE International Conference on Computer and Information Technology (CIT 2007). pp. 399–404. doi:10.1109/CIT.2007.22. ISBN   978-0-7695-2983-7. S2CID   8707991..
  14. Alakuijala, Jyrki (May 15, 2021). "Static dictionary is not why Brotli reaches excellent compression density. Much ..." Hacker News.
  15. Kliuchnikov, Eugene. "How to use large window sizes? · Issue #639 · google/brotli". GitHub. Currently we are testing "Large Window Brotli" extension that will allow up to 1GiB window. [...] "Large Window Brotli" is landed.
  16. Adler, Mark (Jan 26, 2015), "Brotli specification review and verification", Adler brotli, San Francisco: GitHub.
  17. Calvano, Paul (2018-07-25). "Brotli Compression: How Much Will It Reduce Your Content?" . Retrieved 2021-03-07.
  18. Pandjarov, Hristo (2021-01-13). "More Site Speed Gains with Brotli Compression Algorithm". SiteGround . Retrieved 2021-03-07.
  19. "brotli(1) manual page". manned.org.
  20. "Brotli support · Issue #1238 · libarchive/libarchive". GitHub. Without a magic signature, libarchive cannot automatically recognize the file type, so it cannot automatically decompress. (Libarchive does not consider the file name, only the contents.)
  21. Goodger, Ben; et al. (26 January 2016), "Firefox 44 release notes", Mozilla Firefox, Mozilla Foundation.
  22. 1 2 Baheux, Kenji (15 January 2016), "Accept-encoding: br on HTTPS connection", Chrome Platform Status, chromestatus.com.
  23. Trace, Rob (December 20, 2016), "Introducing Brotli compression in Microsoft Edge", Microft Edge Developer, blogs.windows.com
  24. Stenberg, Daniel; et al. "curl - Changes". curl.haxx.se. Retrieved 14 January 2018.
  25. "README". GitHub . 15 October 2021.
  26. "Google Brotli: How to compress, open, extract BR files".
  27. "Changes with Apache 2.4.26", Apache HTTPD repository, svn.apache.org.
  28. "Higher Compression Ratio with Brotli compression". 6 Oct 2023.
  29. "Caching with Azure Front Door". docs.microsoft.com. 15 June 2023.
  30. "Azure Front Door Service is now available". azure.microsoft.com.
  31. "Amazon CloudFront announces support for Brotli compression". aws.amazon.com.
  32. "What will Cloudflare compress?". support.cloudflare.com.
  33. "lighttpd 1.4.56 release info". redmine.lighttpd.net.
Notes
 - Finley, Klint (22 September 2015), "Hooli, I Mean Google, Gives Away Compression Code for Free", Wired Online, wired.com.