Content negotiation

Last updated January 02, 2025

Content negotiation refers to mechanisms defined as a part of HTTP that make it possible to serve different versions of a document (or more generally, representations of a resource) at the same URI, so that user agents can specify which version fits their capabilities the best. One classical use of this mechanism is to serve an image in GIF or PNG format, so that a browser that cannot display PNG images (e.g. MS Internet Explorer 4) will be served the GIF version.

A resource may be available in several different representations; for example, it might be available in different languages or different media types. One way of selecting the most appropriate choice is to give the user an index page and let them select the most appropriate choice; however it is often possible to automate the choice based on some selection criteria.

Mechanisms

HTTP provides for several different content negotiation mechanisms including: server-driven (or proactive), agent-driven (or reactive), transparent, and/or hybrid combinations thereof.

Server-driven

Server-driven or proactive content negotiation is performed by algorithms on the server which choose among the possible variant representations. This is commonly performed based on user agent-provided acceptance criteria.

To summarize how this works, when a user agent submits a request to a server, the user agent informs the server what media types or other aspects of content presentation it understands with ratings of how well it understands them. More precisely, the user agent provides HTTP headers that lists acceptable aspects of the resource and quality factors for them. The server is then able to supply the version of the resource that best fits the user agent's needs.

For example, a browser could indicate that it would like information in German by setting the Accept-Language like this:

Accept-Language: de

The browser may instead say that German is preferred, if possible, but that English is also acceptable by setting:

Accept-Language: de; q=1.0, en; q=0.5

Where the 'q' - quality - factor for German is higher than that for English.

Multiple HTTP headers are often supplied together for content format or, specifically media type, language and a few other aspects of a resource. In addition to the commonly used Accept header for Media Type, the Accept-Language header for language negotiation, RFC 7231 also describes Accept-Charset & Accept-Encodings for character encodings and content codings (compression) respectively.

An example of a more complex request is where a browser sends headers about language indicating German is preferred but that English is acceptable, as above, and that, regarding formats, HTML (text/html) is preferred over other text types (text/*), GIF (image/gif) or JPEG (image/jpg) images are preferred over other image formats (image/*) but that any other media type (*/*) is accepted as a last resort:

Accept-Language:de;q=1.0,en;q=0.5 Accept:text/html;q=1.0,text/*;q=0.8,image/gif;q=0.6,image/jpeg;q=0.6,image/*;q=0.5,*/*;q=0.1

In addition to aspects of server-driven content negotiation by content type and by language specified in RFC 7231, there are extensions defining other aspects of content negotiation, such as Memento which describes use of a Accept-Datetime header to retrieve version of a resource's representation at particular points in time^[1] and the IETF/W3C's Content Negotiation by Profile^[2] which describes use of an Accept-Profile header to retrieve resource representations conforming to data profiles.

Neither RFC 7231 nor the more recent related specifications such as Content Negotiation by Profile^[2] specify how to resolve trade-offs in cases where different headers specify conflicting requirements, such as, in the above example, choosing between an HTML page in English and a GIF image in German.

Agent-driven

Agent-driven or reactive content negotiation is performed by algorithms in the user-agent which choose among the possible variant representations. This is commonly performed based on a server provided list of representations and metadata about them.

To summarize how this works, when a user agent submits a request to a server, the server informs the user-agent which representations it has available as well as any metadata it has about each representation (e.g., content-type, quality, language, etc.). The user-agent then resubmits the request to a specific URL for the chosen representation. This can be automatically chosen by the user-agent or the user-agent can present the user with the choices and the user can directly choose such. More precisely, the server responds with either 300 Multiple Choices or 406 Not Acceptable (when server-driven, user-agent acceptance criteria are provided but the server cannot automatically make a selection). Unfortunately HTTP leaves the format of the list of representations and metadata along with selection mechanisms unspecified.

Related Research Articles

Hypertext Markup Language (HTML) is the standard markup language for documents designed to be displayed in a web browser. It defines the content and structure of web content. It is often assisted by technologies such as Cascading Style Sheets (CSS) and scripting languages such as JavaScript, a programming language.

HTTP is an application layer protocol in the Internet protocol suite model for distributed, collaborative, hypermedia information systems. HTTP is the foundation of data communication for the World Wide Web, where hypertext documents include hyperlinks to other resources that the user can easily access, for example by a mouse click or by tapping the screen in a web browser.

Multipurpose Internet Mail Extensions (MIME) is a standard that extends the format of email messages to support text in character sets other than ASCII, as well as attachments of audio, video, images, and application programs. Message bodies may consist of multiple parts, and header information may be specified in non-ASCII character sets. Email messages with MIME formatting are typically transmitted with standard protocols, such as the Simple Mail Transfer Protocol (SMTP), the Post Office Protocol (POP), and the Internet Message Access Protocol (IMAP).

A Uniform Resource Identifier (URI), formerly Universal Resource Identifier, is a unique sequence of characters that identifies an abstract or physical resource, such as resources on a webpage, mail address, phone number, books, real-world objects such as people and places, concepts. URIs are used to identify anything described using the Resource Description Framework (RDF), for example, concepts that are part of an ontology defined using the Web Ontology Language (OWL), and people who are described using the Friend of a Friend vocabulary would each have an individual URI.

A web server is computer software and underlying hardware that accepts requests via HTTP or its secure variant HTTPS. A user agent, commonly a web browser or web crawler, initiates communication by making a request for a web page or other resource using HTTP, and the server responds with the content of that resource or an error message. A web server can also accept and store resources sent from the user agent if configured to do so.

An HTML element is a type of HTML document component, one of several types of HTML nodes. The first used version of HTML was written by Tim Berners-Lee in 1993 and there have since been many versions of HTML. The current de facto standard is governed by the industry group WHATWG and is known as the HTML Living Standard.

In computing, the User-Agent header is an HTTP header intended to identify the user agent responsible for making a given HTTP request. Whereas the character sequence User-Agent comprises the name of the header itself, the header value that a given user agent uses to identify itself is colloquially known as its user agent string. The user agent for the operator of a computer used to access the Web has encoded within the rules that govern its behavior the knowledge of how to negotiate its half of a request-response transaction; the user agent thus plays the role of the client in a client–server system. Often considered useful in networks is the ability to identify and distinguish the software facilitating a network session. For this reason, the User-Agent HTTP header exists to identify the client software to the responding server.

URL redirection, also called URL forwarding, is a World Wide Web technique for making a web page available under more than one URL address. When a web browser attempts to open a URL that has been redirected, a page with a different URL is opened. Similarly, domain redirection or domain forwarding is when all pages in a URL domain are redirected to a different domain, as when wikipedia.com and wikipedia.net are automatically redirected to wikipedia.org.

Web standards are the formal, non-proprietary standards and other technical specifications that define and describe aspects of the World Wide Web. In recent years, the term has been more frequently associated with the trend of endorsing a set of standardized best practices for building web sites, and a philosophy of web design and development that includes those methods.

Link prefetching allows web browsers to pre-load resources. This speeds up both the loading and rendering of web pages. Prefetching was first introduced in HTML5.

In computer hypertext, a URI fragment is a string of characters that refers to a resource that is subordinate to another, primary resource. The primary resource is identified by a Uniform Resource Identifier (URI), and the fragment identifier points to the subordinate resource.

HTTP compression is a capability that can be built into web servers and web clients to improve transfer speed and bandwidth utilization.

In HTTP, "Referer" is an optional HTTP header field that identifies the address of the web page from which the resource has been requested. By checking the referrer, the server providing the new web page can see where the request originated.

The ETag or entity tag is part of HTTP, the protocol for the World Wide Web. It is one of several mechanisms that HTTP provides for Web cache validation, which allows a client to make conditional requests. This mechanism allows caches to be more efficient and saves bandwidth, as a Web server does not need to send a full response if the content has not changed. ETags can also be used for optimistic concurrency control to help prevent simultaneous updates of a resource from overwriting each other.

HTTP 403 is an HTTP status code meaning access to the requested resource is forbidden. The server understood the request, but will not fulfill it, if it was correct.

In computing, POST is a request method supported by HTTP used by the World Wide Web. By design, the POST request method requests that a web server accepts the data enclosed in the body of the request message, most likely for storing it. It is often used when uploading a file or when submitting a completed web form.

Memento is a United States National Digital Information Infrastructure and Preservation Program (NDIIPP)–funded project aimed at making Web-archived content more readily discoverable and accessible to the public.

HTML audio is a subject of the HTML specification, incorporating audio input, playback, and synthesis, as well as speech to text, all in the browser.

References

↑ Memento: Adding Time to the Web. Mementoweb.org. Retrieved on 2013-09-08.
1 2 "World Wide Web Consortium (W3C), "Content Negotiation by Profile", W3C Working Draft, 26 November 2019".

This article is based in part on this page Archived 2014-11-15 at the Wayback Machine , which is copyrighted by the Apache Foundation but released under a free license.

External links

RFC 7231 — Hypertext Transfer Protocol (HTTP/1.1): Semantics and Content– (Section 5.3: Content Negotiation)
RFC 2295 — Transparent Content Negotiation in HTTP
RFC 2296 — HTTP Remote Variant Selection Algorithm -- RVSA/1.0
Apache Content Negotiation
Open source PHP content negotiation library (supports wildcards and q values)
Discussion about XHTML serving with content negotiation and browser concerns requiring this

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Memento: Adding Time to the Web. Mementoweb.org. Retrieved on 2013-09-08.

[connegp-2] 1 2 "World Wide Web Consortium (W3C), "Content Negotiation by Profile", W3C Working Draft, 26 November 2019".

[1]

[2]