Web cache

Last updated

A Web cache (or HTTP cache) is a system for optimizing the World Wide Web. It is implemented both client-side and server-side. The caching of multimedia and other files can result in less overall delay when browsing the Web. [1] [2]

Contents

Parts of the system

Forward and reverse

A forward cache is a cache outside the web server's network, e.g. in the client's web browser, in an ISP, or within a corporate network. [3] A network-aware forward cache only caches heavily accessed items. [4] A proxy server sitting between the client and web server can evaluate HTTP headers and choose whether to store web content.

A reverse cache sits in front of one or more web servers, accelerating requests from the Internet and reducing peak server load. This is usually a content delivery network (CDN) that retains copies of web content at various points throughout a network.

HTTP options

The Hypertext Transfer Protocol (HTTP) defines three basic mechanisms for controlling caches: freshness, validation, and invalidation. [5] This is specified in the header of HTTP response messages from the server.

Freshness allows a response to be used without re-checking it on the origin server, and can be controlled by both the server and the client. For example, the Expires response header gives a date when the document becomes stale, and the Cache-Control: max-age directive tells the cache how many seconds the response is fresh for.

Validation can be used to check whether a cached response is still good after it becomes stale. For example, if the response has a Last-Modified header, a cache can make a conditional request using the If-Modified-Since header to see if it has changed. The ETag (entity tag) mechanism also allows for both strong and weak validation.

Invalidation is usually a side effect of another request that passes through the cache. For example, if a URL associated with a cached response subsequently gets a POST, PUT or DELETE request, the cached response will be invalidated. Many CDNs and manufacturers of network equipment have replaced this standard HTTP cache control with dynamic caching.

Legality

In 1998, the Digital Millennium Copyright Act added rules to the United States Code (17 U.S.C. §: 512) that exempts system operators from copyright liability for the purposes of caching.

Server-side software

This is a list of server-side web caching software.

NameOperating systemForward
mode
Reverse
mode
License
Windows Unix-like Other
Apache HTTP Server YesOS X, Linux, Unix, FreeBSD, Solaris, Novell NetWareOS/2, TPF, OpenVMS, eComStationYes Apache 2.0
aiScaler Dynamic Cache ControlNo Linux No Proprietary
ApplianSys CACHEbox No Linux No Proprietary
Blue Coat ProxySGNoNoSGOSYesYes Proprietary
Nginx Yes Linux, BSD, OS X, Solaris, AIX, HP-UX YesYesYes2-clause BSD-like
Microsoft Forefront Threat Management Gateway YesNoNoYesYes Proprietary
Polipo Yes OS X, Linux, OpenWrt, FreeBSD  ?YesYes MIT License
Squid Yes Linux  ?YesYes GPL
Apache Traffic Server  ? Linux  ?YesYes Apache 2.0
UntangleNo Linux NoYesYes Proprietary
Varnish No Linux NoNeeds a VMODYes BSD
WinGate YesNoNoYesYes Proprietary (Free for 8 users)
NusterNo Linux NoYesYes GPL
McAfee Web GatewayNoMcAfee Linux Operating SystemNoYesYes Proprietary

See also

Related Research Articles

<span class="mw-page-title-main">Cache (computing)</span> Additional storage that enables faster access to main storage

In computing, a cache is a hardware or software component that stores data so that future requests for that data can be served faster; the data stored in a cache might be the result of an earlier computation or a copy of data stored elsewhere. A cache hit occurs when the requested data can be found in a cache, while a cache miss occurs when it cannot. Cache hits are served by reading data from the cache, which is faster than recomputing a result or reading from a slower data store; thus, the more requests that can be served from the cache, the faster the system performs.

<span class="mw-page-title-main">HTTP</span> Application protocol for distributed, collaborative, hypermedia information systems

HTTP is an application layer protocol in the Internet protocol suite model for distributed, collaborative, hypermedia information systems. HTTP is the foundation of data communication for the World Wide Web, where hypertext documents include hyperlinks to other resources that the user can easily access, for example by a mouse click or by tapping the screen in a web browser.

<span class="mw-page-title-main">Web server</span> Computer software that distributes web pages

A web server is computer software and underlying hardware that accepts requests via HTTP or its secure variant HTTPS. A user agent, commonly a web browser or web crawler, initiates communication by making a request for a web page or other resource using HTTP, and the server responds with the content of that resource or an error message. A web server can also accept and store resources sent from the user agent if configured to do so.

<span class="mw-page-title-main">Proxy server</span> Computer server that makes and receives requests on behalf of a user

In computer networking, a proxy server is a server application that acts as an intermediary between a client requesting a resource and the server providing that resource. It improves privacy, security, and possibly performance in the process.

<span class="mw-page-title-main">Squid (software)</span> Caching and forwarding HTTP web proxy

Squid is a caching and forwarding HTTP web proxy. It has a wide variety of uses, including speeding up a web server by caching repeated requests, caching World Wide Web (WWW), Domain Name System (DNS), and other network lookups for a group of people sharing network resources, and aiding security by filtering traffic. Although used for mainly HTTP and File Transfer Protocol (FTP), Squid includes limited support for several other protocols including Internet Gopher, Secure Sockets Layer (SSL), Transport Layer Security (TLS), and Hypertext Transfer Protocol Secure (HTTPS). Squid does not support the SOCKS protocol, unlike Privoxy, with which Squid can be used in order to provide SOCKS support.

URL redirection, also called URL forwarding, is a World Wide Web technique for making a web page available under more than one URL address. When a web browser attempts to open a URL that has been redirected, a page with a different URL is opened. Similarly, domain redirection or domain forwarding is when all pages in a URL domain are redirected to a different domain, as when wikipedia.com and wikipedia.net are automatically redirected to wikipedia.org.

<span class="mw-page-title-main">Content delivery network</span> Layer in the internet ecosystem addressing bottlenecks

A content delivery network or content distribution network (CDN) is a geographically distributed network of proxy servers and their data centers. The goal is to provide high availability and performance ("speed") by distributing the service spatially relative to end users. CDNs came into existence in the late 1990s as a means for alleviating the performance bottlenecks of the Internet as the Internet was starting to become a mission-critical medium for people and enterprises. Since then, CDNs have grown to serve a large portion of the Internet content today, including web objects, downloadable objects, applications, live streaming media, on-demand streaming media, and social media sites.

<span class="mw-page-title-main">HTTP pipelining</span> Computer communication technique

HTTP pipelining is a feature of HTTP/1.1, which allows multiple HTTP requests to be sent over a single TCP connection without waiting for the corresponding responses. HTTP/1.1 requires servers to respond to pipelined requests correctly, with non-pipelined but valid responses even if server does not support HTTP pipelining. Despite this requirement, many legacy HTTP/1.1 servers do not support pipelining correctly, forcing most HTTP clients to not use HTTP pipelining.

<span class="mw-page-title-main">Reverse proxy</span> Type of proxy server

In computer networks, a reverse proxy or surrogate server is a proxy server that appears to any client to be an ordinary web server, but in reality merely acts as an intermediary that forwards the client's requests to one or more ordinary web servers. Reverse proxies help increase scalability, performance, resilience, and security, but they also carry a number of risks.

<span class="mw-page-title-main">HTTP compression</span> Capability that can be built into web servers and web clients

HTTP compression is a capability that can be built into web servers and web clients to improve transfer speed and bandwidth utilization.

<span class="mw-page-title-main">HTTP ETag</span> Communications protocol

The ETag or entity tag is part of HTTP, the protocol for the World Wide Web. It is one of several mechanisms that HTTP provides for Web cache validation, which allows a client to make conditional requests. This mechanism allows caches to be more efficient and saves bandwidth, as a Web server does not need to send a full response if the content has not changed. ETags can also be used for optimistic concurrency control to help prevent simultaneous updates of a resource from overwriting each other.

<span class="mw-page-title-main">HTTP 403</span> HTTP status code indicating that access is forbidden to a resource

HTTP 403 is an HTTP status code meaning access to the requested resource is forbidden. The server understood the request, but will not fulfill it, if it was correct.

<span class="mw-page-title-main">Polipo</span>

Polipo is a discontinued lightweight caching and forwarding web proxy server. It has a wide variety of uses, from aiding security by filtering traffic; to caching web, DNS and other computer network lookups for a group of people sharing network resources; to speeding up a web server by caching repeated requests. It can be configured to use on-disk cache and serve cached content when offline and perform various forms of content filtering.

<span class="mw-page-title-main">WebSocket</span> Computer network protocol

WebSocket is a computer communications protocol, providing a simultaneous two-way communication channel over a single Transmission Control Protocol (TCP) connection. The WebSocket protocol was standardized by the IETF as RFC 6455 in 2011. The current specification allowing web applications to use this protocol is known as WebSockets. It is a living standard maintained by the WHATWG and a successor to The WebSocket API from the W3C.

Web performance refers to the speed in which web pages are downloaded and displayed on the user's web browser. Web performance optimization (WPO), or website optimization is the field of knowledge about increasing web performance.

Dynamic Site Acceleration (DSA) is a group of technologies which make the delivery of dynamic websites more efficient. Manufacturers of application delivery controllers and content delivery networks (CDNs) use a host of techniques to accelerate dynamic sites, including:

Client Hints are an extension to the existing Hypertext Transfer Protocol (HTTP) that allows web servers to ask the client for information about its configuration. The client can choose to respond to this request by advertising the requested information about itself through sending the data using a specific part of the HTTP protocol called HTTP Header fields or by exposing the same information to the JavaScript code being executed on a web page. This can then help the server tailor its responses to the client; for example, a server can choose to send a smaller image if a client advertises that they have a very small screen.

Gemini is an application-layer internet communication protocol for accessing remote documents, similar to HTTP and Gopher. It comes with a special document format, commonly referred to as "gemtext", which allows linking to other documents. Started by a pseudonymous person known as Solderpunk, the protocol is being finalized collaboratively and as of October 2022, has not been submitted to the IETF organization for standardization.

References

  1. Fountis, Yorgos (4 May 2017). "How does the browser cache work?".
  2. Messaoud, S.; Youssef, H. (2009). "An analytical model for the performance evaluation of stack-based Web cache replacement algorithms". International Journal of Communication Systems. 23: 1–22. doi:10.1002/dac.1036. S2CID   46507769.
  3. Shinder, Thomas (2 September 2008). "Understanding Web Caching Concepts for the ISA Firewall". ISA Server . TechGenix Ltd. Archived from the original on 23 July 2011. Retrieved 27 February 2011.
  4. Erman, Jeffrey; Gerber, Alexandre; Hajiaghayi, Mohammad T.; Pei, Dan; Spatscheck, Oliver (2008). "Network-Aware Forward Caching" (PDF). AT&T Labs : 291–300. CiteSeerX   10.1.1.159.1786 . Archived from the original (PDF) on 1 April 2011. Retrieved 11 March 2019.
  5. Kelly, Mike; Hausenblas, Michael. "Using HTTP Link: Header for Gateway Cache Invalidation" (PDF). WS-REST. p. 20. Archived from the original (PDF) on 10 July 2010. Retrieved 14 June 2013.

Further reading