HTTP compression

Last updated

HTTP compression is a capability that can be built into web servers and web clients to improve transfer speed and bandwidth utilization. [1]

Contents

HTTP data is compressed before it is sent from the server: compliant browsers will announce what methods are supported to the server before downloading the correct format; browsers that do not support compliant compression method will download uncompressed data. The most common compression schemes include gzip and Brotli; a full list of available schemes is maintained by the IANA. [2]

There are two different ways compression can be done in HTTP. At a lower level, a Transfer-Encoding header field may indicate the payload of an HTTP message is compressed. At a higher level, a Content-Encoding header field may indicate that a resource being transferred, cached, or otherwise referenced is compressed. Compression using Content-Encoding is more widely supported than Transfer-Encoding, and some browsers do not advertise support for Transfer-Encoding compression to avoid triggering bugs in servers. [3]

Compression scheme negotiation

The negotiation is done in two steps, described in RFC 2616 and RFC 9110:

1. The web client advertises which compression schemes it supports by including a list of tokens in the HTTP request. For Content-Encoding, the list is in a field called Accept-Encoding; for Transfer-Encoding, the field is called TE.

GET/encrypted-areaHTTP/1.1Host:www.example.comAccept-Encoding:gzip, deflate

2. If the server supports one or more compression schemes, the outgoing data may be compressed by one or more methods supported by both parties. If this is the case, the server will add a Content-Encoding or Transfer-Encoding field in the HTTP response with the used schemes, separated by commas.

HTTP/1.1200OKDate:mon, 26 June 2016 22:38:34 GMTServer:Apache/1.3.3.7 (Unix)  (Red-Hat/Linux)Last-Modified:Wed, 08 Jan 2003 23:11:55 GMTAccept-Ranges:bytesContent-Length:438Connection:closeContent-Type:text/html; charset=UTF-8Content-Encoding:gzip

The web server is by no means obligated to use any compression method – this depends on the internal settings of the web server and also may depend on the internal architecture of the website in question.

Content-Encoding tokens

The official list of tokens available to servers and client is maintained by IANA, [4] and it includes:

In addition to these, a number of unofficial or non-standardized tokens are used in the wild by either servers or clients:

Servers that support HTTP compression

Many content delivery networks also implement HTTP compression to improve speedy delivery of resources to end users.

The compression in HTTP can also be achieved by using the functionality of server-side scripting languages like PHP, or programming languages like Java.

Various online tools exist to verify a working implementation of HTTP compression. These online tools usually request multiple variants of a URL, each with different request headers (with varying Accept-Encoding content). HTTP compression is considered to be implemented correctly when the server returns a document in a compressed format. [17] By comparing the sizes of the returned documents, the effective compression ratio can be calculated (even between different compression algorithms).

Problems preventing the use of HTTP compression

A 2009 article by Google engineers Arvind Jain and Jason Glasgow states that more than 99 person-years are wasted [18] daily due to increase in page load time when users do not receive compressed content. This occurs when anti-virus software interferes with connections to force them to be uncompressed, where proxies are used (with overcautious web browsers), where servers are misconfigured, and where browser bugs stop compression being used. Internet Explorer 6, which drops to HTTP 1.0 (without features like compression or pipelining) when behind a proxy – a common configuration in corporate environments – was the mainstream browser most prone to failing back to uncompressed HTTP. [18]

Another problem found while deploying HTTP compression on large scale is due to the deflate encoding definition: while HTTP 1.1 defines the deflate encoding as data compressed with deflate (RFC 1951) inside a zlib formatted stream (RFC 1950), Microsoft server and client products historically implemented it as a "raw" deflated stream, [19] making its deployment unreliable. [20] [21] For this reason, some software, including the Apache HTTP Server, only implement gzip encoding.

Security implications

Compression allows a form of chosen plaintext attack to be performed: if an attacker can inject any chosen content into the page, they can know whether the page contains their given content by observing the size increase of the encrypted stream. If the increase is smaller than expected for random injections, it means that the compressor has found a repeat in the text, i.e. the injected content overlaps the secret information. This is the idea behind CRIME.

In 2012, a general attack against the use of data compression, called CRIME, was announced. While the CRIME attack could work effectively against a large number of protocols, including but not limited to TLS, and application-layer protocols such as SPDY or HTTP, only exploits against TLS and SPDY were demonstrated and largely mitigated in browsers and servers. The CRIME exploit against HTTP compression has not been mitigated at all, even though the authors of CRIME have warned that this vulnerability might be even more widespread than SPDY and TLS compression combined.

In 2013, a new instance of the CRIME attack against HTTP compression, dubbed BREACH, was published. A BREACH attack can extract login tokens, email addresses or other sensitive information from TLS encrypted web traffic in as little as 30 seconds (depending on the number of bytes to be extracted), provided the attacker tricks the victim into visiting a malicious web link. [22] All versions of TLS and SSL are at risk from BREACH regardless of the encryption algorithm or cipher used. [23] Unlike previous instances of CRIME, which can be successfully defended against by turning off TLS compression or SPDY header compression, BREACH exploits HTTP compression which cannot realistically be turned off, as virtually all web servers rely upon it to improve data transmission speeds for users. [22]

As of 2016, the TIME attack and the HEIST attack are now public knowledge. [24] [25] [26] [27]

Related Research Articles

gzip GNU file compression/decompression tool

gzip is a file format and a software application used for file compression and decompression. The program was created by Jean-loup Gailly and Mark Adler as a free software replacement for the compress program used in early Unix systems, and intended for use by GNU. Version 0.1 was first publicly released on 31 October 1992, and version 1.0 followed in February 1993.

<span class="mw-page-title-main">HTTP</span> Application protocol for distributed, collaborative, hypermedia information systems

The Hypertext Transfer Protocol (HTTP) is an application layer protocol in the Internet protocol suite model for distributed, collaborative, hypermedia information systems. HTTP is the foundation of data communication for the World Wide Web, where hypertext documents include hyperlinks to other resources that the user can easily access, for example by a mouse click or by tapping the screen in a web browser.

<span class="mw-page-title-main">HTTPS</span> Extension of the HTTP communications protocol to support TLS encryption

Hypertext Transfer Protocol Secure (HTTPS) is an extension of the Hypertext Transfer Protocol (HTTP). It uses encryption for secure communication over a computer network, and is widely used on the Internet. In HTTPS, the communication protocol is encrypted using Transport Layer Security (TLS) or, formerly, Secure Sockets Layer (SSL). The protocol is therefore also referred to as HTTP over TLS, or HTTP over SSL.

zlib DEFLATE codec library

zlib is a software library used for data compression as well as a data format. zlib was written by Jean-loup Gailly and Mark Adler and is an abstraction of the DEFLATE compression algorithm used in their gzip file compression program. zlib is also a crucial component of many software platforms, including Linux, macOS, and iOS. It has also been used in gaming consoles such as the PlayStation 4, PlayStation 3, Wii U, Wii, Xbox One and Xbox 360.

The File Transfer Protocol (FTP) is a standard communication protocol used for the transfer of computer files from a server to a client on a computer network. FTP is built on a client–server model architecture using separate control and data connections between the client and the server. FTP users may authenticate themselves with a plain-text sign-in protocol, normally in the form of a username and password, but can connect anonymously if the server is configured to allow it. For secure transmission that protects the username and password, and encrypts the content, FTP is often secured with SSL/TLS (FTPS) or replaced with SSH File Transfer Protocol (SFTP).

In computing, Deflate is a lossless data compression file format that uses a combination of LZ77 and Huffman coding. It was designed by Phil Katz, for version 2 of his PKZIP archiving tool. Deflate was later specified in RFC 1951 (1996).

Transport Layer Security (TLS) is a cryptographic protocol designed to provide communications security over a computer network. The protocol is widely used in applications such as email, instant messaging, and voice over IP, but its use in securing HTTPS remains the most publicly visible.

lighttpd

lighttpd is an open-source web server optimized for speed-critical environments while remaining standards-compliant, secure and flexible. It was originally written by Jan Kneschke as a proof-of-concept of the c10k problem – how to handle 10,000 connections in parallel on one server, but has gained worldwide popularity. Its name is a portmanteau of "light" and "httpd".

Variant objects in the context of HTTP are objects served by an Origin Content Server in a type of transmitted data variation.

mod_deflate is an optional module for the Apache HTTP Server, Apache v2.0 and later. It is based on Deflate lossless data compression algorithm that uses a combination of the LZ77 algorithm and Huffman coding. This module provides the DEFLATE output filter that allows output from Apache HTTP server to be compressed before being sent to the client over the network. It also provides a filter for decompressing a gzip compressed response body.

mod_gzip is an external extension module for the Apache HTTP Server v1 and v2.

SPDY is an obsolete open-specification communication protocol developed for transporting web content. SPDY became the basis for HTTP/2 specification. However, HTTP/2 diverged from SPDY and eventually HTTP/2 subsumed all usecases of SPDY. After HTTP/2 was ratified as a standard, major implementers, including Google, Mozilla, and Apple, deprecated SPDY in favor of HTTP/2. Since 2021, no modern browser supports SPDY.

<span class="mw-page-title-main">WebSocket</span> Computer network protocol

WebSocket is a computer communications protocol, providing simultaneous two-way communication channels over a single Transmission Control Protocol (TCP) connection. The WebSocket protocol was standardized by the IETF as RFC 6455 in 2011. The current specification allowing web applications to use this protocol is known as WebSockets. It is a living standard maintained by the WHATWG and a successor to The WebSocket API from the W3C.

HTTP/2 is a major revision of the HTTP network protocol used by the World Wide Web. It was derived from the earlier experimental SPDY protocol, originally developed by Google. HTTP/2 was developed by the HTTP Working Group of the Internet Engineering Task Force (IETF). HTTP/2 is the first new version of HTTP since HTTP/1.1, which was standardized in RFC 2068 in 1997. The Working Group presented HTTP/2 to the Internet Engineering Steering Group (IESG) for consideration as a Proposed Standard in December 2014, and IESG approved it to publish as Proposed Standard on February 17, 2015. The initial HTTP/2 specification was published as RFC 7540 on May 14, 2015.

CRIME is a security vulnerability in HTTPS and SPDY protocols that utilize compression, which can leak the content of secret web cookies. When used to recover the content of secret authentication cookies, it allows an attacker to perform session hijacking on an authenticated web session, allowing the launching of further attacks. CRIME was assigned CVE-2012-4929.

<span class="mw-page-title-main">Zopfli</span> Data compression software

Zopfli is a data compression library that performs Deflate, gzip and zlib data encoding. It achieves higher compression ratios than mainstream Deflate and zlib implementations at the cost of being slower. Google first released Zopfli in February 2013 under the terms of Apache License 2.0.

BREACH is a security vulnerability against HTTPS when using HTTP compression. BREACH is built based on the CRIME security exploit. BREACH was announced at the August 2013 Black Hat conference by security researchers Angelo Prado, Neal Harris and Yoel Gluck. The idea had been discussed in community before the announcement.

Brotli is a lossless data compression algorithm developed by Google. It uses a combination of the general-purpose LZ77 lossless compression algorithm, Huffman coding and 2nd-order context modelling. Brotli is primarily used by web servers and content delivery networks to compress HTTP content, making internet websites load faster. A successor to gzip, it is supported by all major web browsers and has become increasingly popular, as it provides better compression than gzip.

SDCH is a data compression algorithm created by Google, based on VCDIFF.

References

  1. "Using HTTP Compression (IIS 6.0)". Microsoft Corporation. Retrieved 9 February 2010.
  2. RFC 2616, Section 3.5: "The Internet Assigned Numbers Authority (IANA) acts as a registry for content-coding value tokens."
  3. 'RFC2616 "Transfer-Encoding: gzip, chunked" not handled properly', Chromium Issue 94730
  4. "Hypertext Transfer Protocol Parameters - HTTP Content Coding Registry". IANA. Retrieved 18 April 2014.
  5. "Compression Tests: Results". Verve Studios, Co. Archived from the original on 21 March 2012. Retrieved 19 July 2012.
  6. "JSR 200: Network Transfer Format for Java Archives". The Java Community Process Program.
  7. "ModCompress - Lighttpd". lighty labs. Retrieved 18 April 2014.
  8. elinks LZMA decompression
  9. "[MS-PCCRTP]: Peer Content Caching and Retrieval: Hypertext Transfer Protocol (HTTP) Extensions". Microsoft. Retrieved 19 April 2014.
  10. "rproxy: Protocol Definition for HTTP rsync Encoding". rproxy.samba.org.
  11. "[MS-XCA]: Xpress Compression Algorithm" . Retrieved 29 August 2015.
  12. "LZMA2 Compression - MozillaWiki" . Retrieved 18 April 2014.
  13. "mget GitHub project page". GitHub . Retrieved 6 January 2017.
  14. "mod_deflate - Apache HTTP Server Version 2.4 - Supported Encodings".
  15. "Extra part of Hiawatha webserver's manual".
  16. "Serving static files part of Armeria's documentation".
  17. "How does the gzip compression check work?". httptools.dev, retrieved 10 April 2022.
  18. 1 2 "Use compression to make the web faster". Google Inc. Retrieved 22 May 2013.
  19. "deflate - Why are major web sites using gzip?". Stack Overflow. Retrieved 18 April 2014.
  20. "Compression Tests: About". Verve Studios. Archived from the original on 2 January 2015. Retrieved 18 April 2014.
  21. "Lose the wait: HTTP Compression". Zoompf Web Performance. Retrieved 18 April 2014.
  22. 1 2 Goodin, Dan (1 August 2013). "Gone in 30 seconds: New attack plucks secrets from HTTPS-protected pages". Ars Technica. Condé Nast. Retrieved 2 August 2013.
  23. Leyden, John (2 August 2013). "Step into the BREACH: New attack developed to read encrypted web data". The Register. Retrieved 2 August 2013.
  24. Sullivan, Nick (11 August 2016). "CRIME, TIME, BREACH and HEIST: A brief history of compression oracle attacks on HTTPS" . Retrieved 16 August 2016.
  25. Goodin, Dan (3 August 2016). "HEIST exploit — New attack steals SSNs, e-mail addresses, and more from HTTPS pages" . Retrieved 16 August 2016.
  26. Be'ery, Tal. "A Perfect Crime? TIME will tell" (PDF).
  27. Vanhoef, Mathy. "HEIST: HTTP Encrypted Information can be Stolen through TCP-windows" (PDF).