Content negotiation

Last updated

Content negotiation refers to mechanisms defined as a part of HTTP that make it possible to serve different versions of a document (or more generally, representations of a resource) at the same URI, so that user agents can specify which version fits their capabilities the best. One classical use of this mechanism is to serve an image in GIF or PNG format, so that a browser that cannot display PNG images (e.g. MS Internet Explorer 4) will be served the GIF version.

Contents

A resource may be available in several different representations; for example, it might be available in different languages or different media types. One way of selecting the most appropriate choice is to give the user an index page and let them select the most appropriate choice; however it is often possible to automate the choice based on some selection criteria.

Mechanisms

HTTP provides for several different content negotiation mechanisms including: server-driven (or proactive), agent-driven (or reactive), transparent, and/or hybrid combinations thereof.

Server-driven

Server-driven or proactive content negotiation is performed by algorithms on the server which choose among the possible variant representations. This is commonly performed based on user agent-provided acceptance criteria.

To summarize how this works, when a user agent submits a request to a server, the user agent informs the server what media types or other aspects of content presentation it understands with ratings of how well it understands them. More precisely, the user agent provides HTTP headers that lists acceptable aspects of the resource and quality factors for them. The server is then able to supply the version of the resource that best fits the user agent's needs.

For example, a browser could indicate that it would like information in German by setting the Accept-Language like this:

Accept-Language: de

The browser may instead say that German is preferred, if possible, but that English is also acceptable by setting:

Accept-Language: de; q=1.0, en; q=0.5

Where the 'q' - quality - factor for German is higher than that for English.

Multiple HTTP headers are often supplied together for content format or, specifically media type, language and a few other aspects of a resource. In addition to the commonly used Accept header for Media Type, the Accept-Language header for language negotiation, RFC 7231 also describes Accept-Charset & Accept-Encodings for character encodings and content codings (compression) respectively.

An example of a more complex request is where a browser sends headers about language indicating German is preferred but that English is acceptable, as above, and that, regarding formats, HTML (text/html) is preferred over other text types (text/*), GIF (image/gif) or JPEG (image/jpg) images are preferred over other image formats (image/*) but that any other media type (*/*) is accepted as a last resort:

Accept-Language:de;q=1.0,en;q=0.5 Accept:text/html;q=1.0,text/*;q=0.8,image/gif;q=0.6,image/jpeg;q=0.6,image/*;q=0.5,*/*;q=0.1 

In addition to aspects of server-driven content negotiation by content type and by language specified in RFC 7231, there are extensions defining other aspects of content negotiation, such as Memento which describes use of a Accept-Datetime header to retrieve version of a resource's representation at particular points in time [1] and the IETF/W3C's Content Negotiation by Profile [2] which describes use of an Accept-Profile header to retrieve resource representations conforming to data profiles.

Neither RFC 7231 nor the more recent related specifications such as Content Negotiation by Profile [2] specify how to resolve trade-offs in cases where different headers specify conflicting requirements, such as, in the above example, choosing between an HTML page in English and a GIF image in German.

Agent-driven

Agent-driven or reactive content negotiation is performed by algorithms in the user-agent which choose among the possible variant representations. This is commonly performed based on a server provided list of representations and metadata about them.

To summarize how this works, when a user agent submits a request to a server, the server informs the user-agent which representations it has available as well as any metadata it has about each representation (e.g., content-type, quality, language, etc.). The user-agent then resubmits the request to a specific URL for the chosen representation. This can be automatically chosen by the user-agent or the user-agent can present the user with the choices and the user can directly choose such. More precisely, the server responds with either 300 Multiple Choices or 406 Not Acceptable (when server-driven, user-agent acceptance criteria are provided but the server cannot automatically make a selection). Unfortunately HTTP leaves the format of the list of representations and metadata along with selection mechanisms unspecified.

Related Research Articles

<span class="mw-page-title-main">HTML</span> HyperText Markup Language

The HyperText Markup Language or HTML is the standard markup language for documents designed to be displayed in a web browser. It defines the meaning and structure of web content. It is often assisted by technologies such as Cascading Style Sheets (CSS) and scripting languages such as JavaScript.

<span class="mw-page-title-main">HTTP</span> Application protocol for distributed, collaborative, hypermedia information systems

The Hypertext Transfer Protocol (HTTP) is an application layer protocol in the Internet protocol suite model for distributed, collaborative, hypermedia information systems. HTTP is the foundation of data communication for the World Wide Web, where hypertext documents include hyperlinks to other resources that the user can easily access, for example by a mouse click or by tapping the screen in a web browser.

Multipurpose Internet Mail Extensions (MIME) is an Internet standard that extends the format of email messages to support text in character sets other than ASCII, as well as attachments of audio, video, images, and application programs. Message bodies may consist of multiple parts, and header information may be specified in non-ASCII character sets. Email messages with MIME formatting are typically transmitted with standard protocols, such as the Simple Mail Transfer Protocol (SMTP), the Post Office Protocol (POP), and the Internet Message Access Protocol (IMAP).

A Uniform Resource Identifier (URI) is a unique sequence of characters that identifies a logical or physical resource used by web technologies. URIs may be used to identify anything, including real-world objects, such as people and places, concepts, or information resources such as web pages and books. Some URIs provide a means of locating and retrieving information resources on a network ; these are Uniform Resource Locators (URLs). A URL provides the location of the resource. A URI identifies the resource by name at the specified location or URL. Other URIs provide only a unique name, without a means of locating or retrieving the resource or information about it, these are Uniform Resource Names (URNs). The web technologies that use URIs are not limited to web browsers. URIs are used to identify anything described using the Resource Description Framework (RDF), for example, concepts that are part of an ontology defined using the Web Ontology Language (OWL), and people who are described using the Friend of a Friend vocabulary would each have an individual URI.

An HTML element is a type of HTML document component, one of several types of HTML nodes. The first used version of HTML was written by Tim Berners-Lee in 1993 and there have since been many versions of HTML. The most commonly used version is HTML 4.01, which became official standard in December 1999. An HTML document is composed of a tree of simple HTML nodes, such as text nodes, and HTML elements, which add semantics and formatting to parts of document. Each element can have HTML attributes specified. Elements can also have content, including other elements and text.

In computing, the User-Agent header is an HTTP header intended to identify the user agent responsible for making a given HTTP request. Whereas the character sequence User-Agent comprises the name of the header itself, the header value that a given user agent uses to identify itself is colloquially known as its user agent string. The user agent for the operator of a computer used to access the Web has encoded within the rules that govern its behavior the knowledge of how to negotiate its half of a request-response transaction; the user agent thus plays the role of the client in a client–server system. Often considered useful in networks is the ability to identify and distinguish the software facilitating a network session. For this reason, the User-Agent HTTP header exists to identify the client software to the responding server.

URL redirection, also called URL forwarding, is a World Wide Web technique for making a web page available under more than one URL address. When a web browser attempts to open a URL that has been redirected, a page with a different URL is opened. Similarly, domain redirection or domain forwarding is when all pages in a URL domain are redirected to a different domain, as when wikipedia.com and wikipedia.net are automatically redirected to wikipedia.org.

Web standards are the formal, non-proprietary standards and other technical specifications that define and describe aspects of the World Wide Web. In recent years, the term has been more frequently associated with the trend of endorsing a set of standardized best practices for building web sites, and a philosophy of web design and development that includes those methods.

Link prefetching allows web browsers to pre-load resources. This speeds up both the loading and rendering of web pages. Prefetching was first introduced in HTML5.

In computer hypertext, a URI fragment is a string of characters that refers to a resource that is subordinate to another, primary resource. The primary resource is identified by a Uniform Resource Identifier (URI), and the fragment identifier points to the subordinate resource.

<span class="mw-page-title-main">HTTP compression</span> Capability that can be built into web servers and web clients

HTTP compression is a capability that can be built into web servers and web clients to improve transfer speed and bandwidth utilization.

<span class="mw-page-title-main">HTTP referer</span> HTTP header field

In HTTP, "Referer" is an optional HTTP header field that identifies the address of the web page, from which the resource has been requested. By checking the referrer, the server providing the new web page can see where the request originated.

<span class="mw-page-title-main">HTTP ETag</span> Communications protocol

The ETag or entity tag is part of HTTP, the protocol for the World Wide Web. It is one of several mechanisms that HTTP provides for Web cache validation, which allows a client to make conditional requests. This mechanism allows caches to be more efficient and saves bandwidth, as a Web server does not need to send a full response if the content has not changed. ETags can also be used for optimistic concurrency control to help prevent simultaneous updates of a resource from overwriting each other.

<span class="mw-page-title-main">HTTP 302</span> HTTP Status Code

The HTTP response status code 302 Found is a common way of performing URL redirection. The HTTP/1.0 specification initially defined this code, and gave it the description phrase "Moved Temporarily" rather than "Found".

<span class="mw-page-title-main">POST (HTTP)</span> Request method in the HTTP protocol

In computing, POST is a request method supported by HTTP used by the World Wide Web. By design, the POST request method requests that a web server accept the data enclosed in the body of the request message, most likely for storing it. It is often used when uploading a file or when submitting a completed web form.

<span class="mw-page-title-main">Memento Project</span>

Memento is a United States National Digital Information Infrastructure and Preservation Program (NDIIPP)–funded project aimed at making Web-archived content more readily discoverable and accessible to the public.

HTML5 Audio is a subject of the HTML5 specification, incorporating audio input, playback, and synthesis, as well as, in the browser. iOS

Linked Data Notifications (LDN) is a W3C Recommendation that describes a communications protocol based on HTTP, URI, and RDF on how servers (receivers) can receive messages pushed to them by applications (senders), as well as how other applications (consumers) may retrieve those messages. Any web resource can advertise a receiving endpoint (inbox) for notification messages. Messages are expressed in RDF, and can contain arbitrary data.

References

  1. Memento: Adding Time to the Web. Mementoweb.org. Retrieved on 2013-09-08.
  2. 1 2 "World Wide Web Consortium (W3C), "Content Negotiation by Profile", W3C Working Draft, 26 November 2019".
This article is based in part on this page Archived 2014-11-15 at the Wayback Machine , which is copyrighted by the Apache Foundation but released under a free license.