HTTP referer

Last updated

In HTTP, "Referer" (a misspelling of Referrer [1] ) is an optional HTTP header field that identifies the address of the web page (i.e., the URI or IRI), from which the resource has been requested. By checking the referrer, the server providing the new web page can see where the request originated.

Contents

In the most common situation, this means that when a user clicks a hyperlink in a web browser, causing the browser to send a request to the server holding the destination web page, the request may include the Referer field, which indicates the last page the user was on (the one where they clicked the link).

Web sites and web servers log the content of the received Referer field to identify the web page from which the user followed a link, for promotional or statistical purposes. [2] This entails a loss of privacy for the user and may introduce a security risk. [3] To mitigate security risks, browsers have been steadily reducing the amount of information sent in Referer. As of March 2021, by default Chrome, [4] Chromium-based Edge, Firefox, [5] Safari [6] default to sending only the origin in cross-origin requests, stripping out everything but the domain name.

Etymology

The misspelling of referrer was introduced in the original proposal by computer scientist Phillip Hallam-Baker to incorporate the "Referer" header field into the HTTP specification. [7] [8] The misspelling was set in stone by the time (May 1996) of its incorporation into the Request for Comments standards document RFC 1945 [9] (which 'reflects common usage of the protocol referred to as "HTTP/1.0"' at that time); document co-author Roy Fielding remarked in March 1995 that "neither one (referer or referrer) is understood by" the standard Unix spell checker of the period. [10] "Referer" has since become a widely used spelling in the industry when discussing HTTP referrers; usage of the misspelling is not universal, though, as the correct spelling "referrer" is used in some web specifications such as the Referrer-Policy HTTP header or the Document Object Model. [3]

Details

When visiting a web page, the referrer or referring page is the URL of the previous web page from which a link was followed.

More generally, a referrer is the URL of a previous item which led to this request. For example, the referrer for an image is generally the HTML page on which it is to be displayed. The referrer field is an optional part of the HTTP request sent by the web browser to the web server. [11]

Many websites log referrers as part of their attempt to track their users. Most web log analysis software can process this information. Because referrer information can violate privacy, some web browsers allow the user to disable the sending of referrer information. [12] Some proxy and firewall software will also filter out referrer information, to avoid leaking the location of non-public websites. This can, in turn, cause problems: some web servers block parts of their website to web browsers that do not send the right referrer information, in an attempt to prevent deep linking or unauthorised use of images (bandwidth theft). Some proxy software has the ability to give the top-level address of the target website as the referrer, which reduces these problems but can still in some cases divulge the user's last-visited web page.

Many blogs publish referrer information in order to link back to people who are linking to them, and hence broaden the conversation. This has led, in turn, to the rise of referrer spam: the sending of fake referrer information in order to popularize the spammer's website.

It is possible to access the referrer information on the client side using document.referrer in JavaScript. [13] This can be used, for example, to individualize a web page based on a user's search engine query. However, the referrer field does not always include search keywords, such as when using Google Search with HTTPS. [14]

Referrer hiding

Most web servers maintain logs of all traffic, and record the HTTP referrer sent by the web browser for each request. This raises a number of privacy concerns, and as a result, a number of systems to prevent web servers being sent the real referring URL have been developed. These systems work either by blanking the referrer field or by replacing it with inaccurate data. Generally, Internet-security suites blank the referrer data, while web-based servers replace it with a false URL, usually their own. This raises the problem of referrer spam. The technical details of both methods are fairly consistent – software applications act as a proxy server and manipulate the HTTP request, while web-based methods load websites within frames, causing the web browser to send a referrer URL of their website address. Some web browsers give their users the option to turn off referrer fields in the request header. [12]

Most web browsers do not send the referrer field when they are instructed to redirect using the "Refresh" field. This does not include some versions of Opera and many mobile web browsers. However, this method of redirection is discouraged by the World Wide Web Consortium (W3C). [15]

If a website is accessed from a HTTP Secure (HTTPS) connection and a link points to anywhere except another secure location, then the referrer field is not sent. [11]

The HTML5 standard added support for the attribute/value rel="noreferrer", which instructs the user agent to not send a referrer. [16]

Another referrer hiding method is to convert the original link URL to a Data URI scheme-based URL containing small HTML page with a meta refresh to the original URL. When the user is redirected from the data: page, the original referrer is hidden.

Content Security Policy standard version 1.1 introduced a new referrer directive that allows more control over the browser's behaviour in regards to the referrer header. Specifically it allows the webmaster to instruct the browser not to block referrer at all, reveal it only when moving with the same origin etc. [17]

Related Research Articles

<span class="mw-page-title-main">HTTP</span> Application protocol for distributed, collaborative, hypermedia information systems

The Hypertext Transfer Protocol (HTTP) is an application layer protocol in the Internet protocol suite model for distributed, collaborative, hypermedia information systems. HTTP is the foundation of data communication for the World Wide Web, where hypertext documents include hyperlinks to other resources that the user can easily access, for example by a mouse click or by tapping the screen in a web browser.

<span class="mw-page-title-main">World Wide Web</span> Linked hypertext system on the Internet

The World Wide Web is an information system that enables content sharing over the Internet through user-friendly ways meant to appeal to users beyond IT specialists and hobbyists. It allows documents and other web resources to be accessed over the Internet according to specific rules of the Hypertext Transfer Protocol (HTTP).

<span class="mw-page-title-main">Web server</span> Computer software that distributes web pages

A web server is computer software and underlying hardware that accepts requests via HTTP or its secure variant HTTPS. A user agent, commonly a web browser or web crawler, initiates communication by making a request for a web page or other resource using HTTP, and the server responds with the content of that resource or an error message. A web server can also accept and store resources sent from the user agent if configured to do so.

In computing, the User-Agent header is an HTTP header intended to identify the user agent responsible for making a given HTTP request. Whereas the character sequence User-Agent comprises the name of the header itself, the header value that a given user agent uses to identify itself is colloquially known as its user agent string. The user agent for the operator of a computer used to access the Web has encoded within the rules that govern its behavior the knowledge of how to negotiate its half of a request-response transaction; the user agent thus plays the role of the client in a client–server system. Often considered useful in networks is the ability to identify and distinguish the software facilitating a network session. For this reason, the User-Agent HTTP header exists to identify the client software to the responding server.

URL redirection, also called URL forwarding, is a World Wide Web technique for making a web page available under more than one URL address. When a web browser attempts to open a URL that has been redirected, a page with a different URL is opened. Similarly, domain redirection or domain forwarding is when all pages in a URL domain are redirected to a different domain, as when wikipedia.com and wikipedia.net are automatically redirected to wikipedia.org.

Web standards are the formal, non-proprietary standards and other technical specifications that define and describe aspects of the World Wide Web. In recent years, the term has been more frequently associated with the trend of endorsing a set of standardized best practices for building web sites, and a philosophy of web design and development that includes those methods.

A query string is a part of a uniform resource locator (URL) that assigns values to specified parameters. A query string commonly includes fields added to a base URL by a Web browser or other client application, for example as part of an HTML document, choosing the appearance of a page, or jumping to positions in multimedia content.

In the context of an HTTP transaction, basic access authentication is a method for an HTTP user agent to provide a user name and password when making a request. In basic HTTP authentication, a request contains a header field in the form of Authorization: Basic <credentials>, where <credentials> is the Base64 encoding of ID and password joined by a single colon :.

<span class="mw-page-title-main">Content negotiation</span> Serving multiple documents at the same URI

Content negotiation refers to mechanisms defined as a part of HTTP that make it possible to serve different versions of a document at the same URI, so that user agents can specify which version fits their capabilities the best. One classical use of this mechanism is to serve an image in GIF or PNG format, so that a browser that cannot display PNG images will be served the GIF version.

Link prefetching allows web browsers to pre-load resources. This speeds up both the loading and rendering of web pages. Prefetching was first introduced in HTML5.

<span class="mw-page-title-main">HTTP cookie</span> Small pieces of data stored by a web browser while on a website

HTTP cookies are small blocks of data created by a web server while a user is browsing a website and placed on the user's computer or other device by the user's web browser. Cookies are placed on the device used to access a website, and more than one cookie may be placed on a user's device during a session.

<span class="mw-page-title-main">HTTP 403</span> HTTP status code indicating that access is forbidden to a resource

HTTP 403 is an HTTP status code meaning access to the requested resource is forbidden. The server understood the request, but will not fulfill it, if it was correct.

<span class="mw-page-title-main">HTTP 302</span> HTTP Status Code

The HTTP response status code 302 Found is a common way of performing URL redirection. The HTTP/1.0 specification initially defined this code, and gave it the description phrase "Moved Temporarily" rather than "Found".

<span class="mw-page-title-main">POST (HTTP)</span> Request method in the HTTP protocol

In computing, POST is a request method supported by HTTP used by the World Wide Web. By design, the POST request method requests that a web server accept the data enclosed in the body of the request message, most likely for storing it. It is often used when uploading a file or when submitting a completed web form.

<span class="mw-page-title-main">HTTP location</span> Instruction by web server containing the intended location of a web page.

The HTTP Location header field is returned in responses from an HTTP server under two circumstances:

  1. To ask a web browser to load a different web page. In this circumstance, the Location header should be sent with an HTTP status code of 3xx. It is passed as part of the response by a web server when the requested URI has:
  2. To provide information about the location of a newly created resource. In this circumstance, the Location header should be sent with an HTTP status code of 201 or 202.
<span class="mw-page-title-main">Memento Project</span>

Memento is a United States National Digital Information Infrastructure and Preservation Program (NDIIPP)–funded project aimed at making Web-archived content more readily discoverable and accessible to the public.

A uniform resource locator (URL), colloquially known as an address on the Web, is a reference to a resource that specifies its location on a computer network and a mechanism for retrieving it. A URL is a specific type of Uniform Resource Identifier (URI), although many people use the two terms interchangeably. URLs occur most commonly to reference web pages (HTTP/HTTPS) but are also used for file transfer (FTP), email (mailto), database access (JDBC), and many other applications.

Gemini is an application-layer internet communication protocol for accessing remote documents, similar to HTTP and Gopher. It comes with a special document format, commonly referred to as "gemtext", which allows linking to other documents. Started by a pseudonymous person known as Solderpunk, the protocol is being finalized collaboratively and as of October 2022, has not been submitted to the IETF organization for standardization.

References

  1. Gourley, David; Totty, Brian; Sayer, Marjorie; Aggarwal, Anshu; Reddy, Sailu (27 September 2002). HTTP:The Definitive Guide. "O'Reilly Media, Inc.". ISBN   9781565925090.
  2. Kyrnin, Jennifer (2012-04-10). "Referrer - What is a Referrer - How do HTTP Referrers Work?". About.com. Archived from the original on 2013-05-29. Retrieved 2013-03-20.
  3. 1 2 "Does your website have a leak?". ICO Blog. 2015-09-16. Archived from the original on 2018-05-24. Retrieved 2018-08-16.
  4. "Referrer Policy: Default to strict-origin-when-cross-origin - Chrome Platform Status". www.chromestatus.com. Retrieved 2021-03-23.
  5. Lee, Dimi; Kerschbaumer, Christoph (22 March 2021). "Firefox 87 trims HTTP Referrers by default to protect user privacy". Mozilla Security Blog. Retrieved 2021-03-23.
  6. Wilander, John (2019-12-10). "Preventing Tracking Prevention Tracking". WebKit blog.
  7. Hallam-Baker, Phillip (2000-09-21). "Re: Is Al Gore The Father of the Internet?". Newsgroup:  alt.folklore.computers . Retrieved 2013-03-20.
  8. Hallam-Baker, Phillip. "Re: Referer: (sic)". W3C Public mailing list archives. Archived from the original on 2024-02-19. Retrieved 19 February 2024.
  9. Berners-Lee, T.; Fielding, R.; Frystyk, H. (May 1996). Hypertext Transfer Protocol -- HTTP/1.0. IETF. doi: 10.17487/RFC1945 . RFC 1945.
  10. Fielding, Roy (1995-03-09). "Re: referer: (sic)". ietf-http-wg-old (Mailing list). Retrieved 2013-03-20.
  11. 1 2 Fielding, R.; Reschke, J. (June 2014). Fielding, R.; Reschke, J. (eds.). Hypertext Transfer Protocol (HTTP/1.1): Semantics and Content: referrer (RFC 7231 § 5.5.2). IETF. sec. 5.5.2. doi: 10.17487/RFC7231 . S2CID   14399078. RFC 7231 . Retrieved 2014-07-26.
  12. 1 2 "Network.http.sendRefererHeader". MozillaZine. 2007-06-10. Retrieved 2015-05-27.
  13. "HTML DOM Document referrer Property". W3Schools. Retrieved 2013-03-20.
  14. Gundersen, Bret (2011-10-19). "The Impact of Google Encrypted Search". Adobe Digital Marketing Blog . Retrieved 2021-03-17.
  15. "HTML Techniques for Web Content Accessibility Guidelines 1.0: The META element". W3C. 2000-11-06. Retrieved 2013-03-20.
  16. "4.12 Links — HTML Living Standard: 4.12.5.8 Link type "noreferrer"". WHATWG. 2016-02-19. Retrieved 2016-02-19.
  17. "Content Security Policy Level 2". W3. 2014. Retrieved 2014-12-08.