Client Hints

Last updated

Client Hints are a set of HTTP Header fields and a JavaScript web application programming interface (API) for proactive content negotiation in the Hypertext Transfer Protocol (HTTP). The client can advertise information about itself through these fields so the server can determine which resources should be included in its response. Initially proposed in 2013 by engineers at Google, Client Hints were presented as a privacy-preserving alternative to user-agent header strings as part of Google's Privacy Sandbox initiative. The initial design of Client Hints faced pushback from browser vendors due to various privacy concerns. As of May 2024, over 75% of all internet traffic supports Client Hints. Despite this widespread adoption, privacy researchers have raised concerns that Client Hints are primarily being used by tracking scripts.

Contents

Background

Since the early days of the internet, there has been a desire to identify what kind of client a user was using to connect to a server. In 1992, an extension to the HTTP protocol was introduced adding a User-Agent HTTP Header which was sent from the client to the server and contained a simple string identifying the name of the client and its version. The header was meant purely for statistical purposes and for tracking down clients that violated the protocol. Since then, with the evolution of the internet, User-Agents has become increasingly more complex, and has started containing significant granular information about the user. Often, this information is used in browser fingerprinting, allowing sites to track users across sites passively without having to load any JavaScript for the user. [1]

History

The original draft for the client-hint specification was proposed in 2013 by engineers at Google. The specifications became an Internet Engineering Task Force (IETF) draft in November 2015. Subsequently, in 2021, the specification was upgraded to an experimental RFC. Around the same time, the specifications for handling HTTP client hints on the web were published as a draft in a W3C Community Group Report. [2]

In 2020, Google announced their intention to deprecate user-agent (UA) strings as part of their Privacy Sandbox initiative, citing client-hints as a privacy-preserving alternative. [1] The initial client-hints proposal was met with pushback from other browser vendors due to privacy concerns. Mozilla, the company that makes Firefox, initially classified the proposal as harmful, and Apple took a negative stance against the proposal. [1] Brave also raised concerns about the initial proposal, citing ways in which it could be used to track users on the internet. [3] Despite these concerns, Chrome implemented support for HTTP Client Hints in August 2020. While the deprecation of the UA strings was delayed due to the COVID-19 pandemic, this process was completed in February 2023. [1]

Since their initial opposition, Mozilla has updated their stance to neutral and Brave has synchronized its implementation of client hints with that of Chrome. [1] As of May 2024, over 75% of all web users on the internet supports client hints. [2]

Mechanism

The Client Hints protocol defines two entities: a user agent (UA) (typically a browser) and a server. These two entities communicate with each other to negotiate what kind of content should be served to the user. [4] The process involves the server sending the UA a response with an Accept-CH HTTP Header, containing a list of Client Hint HTTP headers that it requires. Subsequently, the UA is expected to return the requested client hints with each subsequent response, provided it supports those hints. These headers are then used by the server to make decisions on what kind of content to serve the UA. [2] If the UA does not understand or support a particular client hint then the UA is instructed to ignore the particular client hint. In cases where the Client-Hints cannot be cached, the server must specify the applicable client hints headers in a separate Vary header sent to the UA. [1] This ensures that caching mechanisms understand that responses can vary based on different client hint values. [5] For client hints that specifically identify a browser, additional random browser identifiers are included as grease in order to prevent protocol ossification around browser sniffing. [6]

For UAs that allow JavaScript, an additional option is available through the navigator.userAgentData JavaScript API. This API enables JavaScript to retrieve the same information as provided by the Client Hints headers. [1]

Example

To initiate a content negotiation, a HTTP server appends the Accept-CH header to the response of a HTTP request:

HTTP/1.1200OK... Accept-CH: Viewport-Width ... 

If the user-agent supports the view-port width client hint, the user-agent will append the Viewport-Width header in every subsequent request,

GET/galleryHTTP/1.1... Viewport-Width: 1920 ... 

the server can then use the information in the Viewport-Width header to make a decision about the kind of content to serve the user-agent. For example, if the server has a particular image that is extremely large, the server can be configured to return smaller image if the image does not fit the viewport. [7]

Privacy concerns

When the client-hints proposal was originally published, it was met with significant privacy concerns. Browser vendors like Brave and Mozilla pointed out that a particular provision in the initial draft of the proposal allowed websites to instruct the browser to provide Client-Hint data to third-party domains. Third-party domains are domains that do not execute any JavaScript code, but rather load resources like images and script files. [3] The provision in the initial draft would allow these third-party domains like content delivery networks (CDN) and cloud service providers like Cloudflare and Google Cloud (called TLS terminators) to track users across the web by instructing the browser to send client-hint information to their servers. [3] [8] Additionally, concerns were also raised that the Client-Hint proposal was too permissive and explicitly allowed for new privacy compromising information that could not be obtained by simply parsing HTTP Headers to be leaked to servers. [8] Additionally extensions that aim to preserve a user's privacy like the NoScript extension also opposed the proposal on the grounds that it would make it significantly harder to prevent sites from exfiltrating privacy-compromising information about users. [3]

Since the adoption of Client Hints by Chromium-based browsers[ citation needed ], privacy researchers have raised concerns over their real-world use for tracking. A 2023 study by researchers from KU Leuven and Radboud University found that out of a crawl of over 100,000 websites, 60% of the scripts accessed the Client Hints JavaScript APIs, with most being tracking and advertising scripts, many of which came from Google. Over 90% of these scripts exfiltrated the obtained data to tracking domains. [1] A subsequent study in May 2024 by researchers from the Hochschule Bonn-Rhein-Sieg University of Applied Sciences noted that while overall adoption of Client Hints amongst websites on the internet was low, a significant number of third-party domains known for tracking accessed HTTP Client Hints data. [2]

See also

Related Research Articles

<span class="mw-page-title-main">HTTP</span> Application protocol for distributed, collaborative, hypermedia information systems

HTTP is an application layer protocol in the Internet protocol suite model for distributed, collaborative, hypermedia information systems. HTTP is the foundation of data communication for the World Wide Web, where hypertext documents include hyperlinks to other resources that the user can easily access, for example by a mouse click or by tapping the screen in a web browser.

<span class="mw-page-title-main">World Wide Web</span> Linked hypertext system on the Internet

The World Wide Web is an information system that enables content sharing over the Internet through user-friendly ways meant to appeal to users beyond IT specialists and hobbyists. It allows documents and other web resources to be accessed over the Internet according to specific rules of the Hypertext Transfer Protocol (HTTP).

In computing, the User-Agent header is an HTTP header intended to identify the user agent responsible for making a given HTTP request. Whereas the character sequence User-Agent comprises the name of the header itself, the header value that a given user agent uses to identify itself is colloquially known as its user agent string. The user agent for the operator of a computer used to access the Web has encoded within the rules that govern its behavior the knowledge of how to negotiate its half of a request-response transaction; the user agent thus plays the role of the client in a client–server system. Often considered useful in networks is the ability to identify and distinguish the software facilitating a network session. For this reason, the User-Agent HTTP header exists to identify the client software to the responding server.

URL redirection, also called URL forwarding, is a World Wide Web technique for making a web page available under more than one URL address. When a web browser attempts to open a URL that has been redirected, a page with a different URL is opened. Similarly, domain redirection or domain forwarding is when all pages in a URL domain are redirected to a different domain, as when wikipedia.com and wikipedia.net are automatically redirected to wikipedia.org.

In the context of an HTTP transaction, basic access authentication is a method for an HTTP user agent to provide a user name and password when making a request. In basic HTTP authentication, a request contains a header field in the form of Authorization: Basic <credentials>, where <credentials> is the Base64 encoding of ID and password joined by a single colon :.

Meta refresh is a method of instructing a web browser to automatically refresh the current web page or frame after a given time interval, using an HTML meta element with the http-equiv parameter set to "refresh" and a content parameter giving the time interval in seconds. It is also possible to instruct the browser to fetch a different URL when the page is refreshed, by including the alternative URL in the content parameter. By setting the refresh time interval to zero, meta refresh can be used as a method of URL redirection.

Push technology, also known as server Push, refers to a communication method, where the communication is initiated by a server rather than a client. This approach is different from the "pull" method where the communication is initiated by a client.

Browser sniffing is a set of techniques used in websites and web applications in order to determine the web browser a visitor is using, and to serve browser-appropriate content to the visitor. It is also used to detect mobile browsers and send them mobile-optimized websites. This practice is sometimes used to circumvent incompatibilities between browsers due to misinterpretation of HTML, Cascading Style Sheets (CSS), or the Document Object Model (DOM). While the World Wide Web Consortium maintains up-to-date central versions of some of the most important Web standards in the form of recommendations, in practice no software developer has designed a browser which adheres exactly to these standards; implementation of other standards and protocols, such as SVG and XMLHttpRequest, varies as well. As a result, different browsers display the same page differently, and so browser sniffing was developed to detect the web browser in order to help ensure consistent display of content.

<span class="mw-page-title-main">HTTP cookie</span> Small pieces of data stored by a web browser while on a website

HTTP cookies are small blocks of data created by a web server while a user is browsing a website and placed on the user's computer or other device by the user's web browser. Cookies are placed on the device used to access a website, and more than one cookie may be placed on a user's device during a session.

<span class="mw-page-title-main">HTTP referer</span> HTTP header field

In HTTP, "Referer" is an optional HTTP header field that identifies the address of the web page from which the resource has been requested. By checking the referrer, the server providing the new web page can see where the request originated.

A device fingerprint or machine fingerprint is information collected about the software and hardware of a remote computing device for the purpose of identification. The information is usually assimilated into a brief identifier using a fingerprinting algorithm. A browser fingerprint is information collected specifically by interaction with the web browser of the device.

A single-page application (SPA) is a web application or website that interacts with the user by dynamically rewriting the current web page with new data from the web server, instead of the default method of loading entire new pages. The goal is faster transitions that make the website feel more like a native app.

Web storage, sometimes known as DOM storage, is a standard JavaScript API provided by web browsers. It enables websites to store persistent data on users' devices similar to cookies, but with much larger capacity and no information sent in HTTP headers. There are two main web storage types: local storage and session storage, behaving similarly to persistent cookies and session cookies respectively. Web Storage is standardized by the World Wide Web Consortium (W3C) and WHATWG, and is supported by all major browsers.

<span class="mw-page-title-main">WebSocket</span> Computer network protocol

WebSocket is a computer communications protocol, providing a simultaneous two-way communication channel over a single Transmission Control Protocol (TCP) connection. The WebSocket protocol was standardized by the IETF as RFC 6455 in 2011. The current specification allowing web applications to use this protocol is known as WebSockets. It is a living standard maintained by the WHATWG and a successor to The WebSocket API from the W3C.

Content Security Policy (CSP) is a computer security standard introduced to prevent cross-site scripting (XSS), clickjacking and other code injection attacks resulting from execution of malicious content in the trusted web page context. It is a Candidate Recommendation of the W3C working group on Web Application Security, widely supported by modern web browsers. CSP provides a standard method for website owners to declare approved origins of content that browsers should be allowed to load on that website—covered types are JavaScript, CSS, HTML frames, web workers, fonts, images, embeddable objects such as Java applets, ActiveX, audio and video files, and other HTML5 features.

HTML audio is a subject of the HTML specification, incorporating audio input, playback, and synthesis, as well as speech to text, all in the browser.

A web beacon is a technique used on web pages and email to unobtrusively allow checking that a user has accessed some content. Web beacons are typically used by third parties to monitor the activity of users at a website for the purpose of web analytics or page tagging. They can also be used for email tracking. When implemented using JavaScript, they may be called JavaScript tags. Web beacons are unseen HTML elements that track a webpage views. Upon the user revisiting the webpage, these beacons are connected to cookies established by the server, facilitating undisclosed user tracking.

Third-party cookies are HTTP cookies which are used principally for web tracking as part of the web advertising ecosystem.

<span class="mw-page-title-main">Privacy Sandbox</span> Google initiative

The Privacy Sandbox is an initiative led by Google to create web standards for websites to access user information without compromising privacy. Its core purpose is to facilitate online advertising by sharing a subset of user private information without the use of third-party cookies. The initiative includes a number of proposals, many of these proposals have bird-themed names which are changed once the corresponding feature reaches general availability. The technology include Topics API, Protected Audience, Attribution Reporting, Private Aggregation, Shared Storage and Fenced Frames as well as other proposed technologies. The project was announced in August 2019.

References

  1. 1 2 3 4 5 6 7 8 Senol, Asuman; Acar, Gunes (2023-11-26). "Unveiling the Impact of User-Agent Reduction and Client Hints: A Measurement Study". Proceedings of the 22nd Workshop on Privacy in the Electronic Society. ACM. pp. 91–106. doi:10.1145/3603216.3624965. ISBN   979-8-4007-0235-8. Archived from the original on 2024-06-26. Retrieved 2024-06-25.
  2. 1 2 3 4 Wiefling, Stephan; Hönscheid, Marian; Iacono, Luigi Lo (2024-05-22), "A Privacy Measure Turned Upside Down? Investigating the Use of HTTP Client Hints on the Web", arXiv: 2405.13744 [cs]
  3. 1 2 3 4 Cimpanu, Catalin (May 16, 2019). "Privacy concerns raised about upcoming Client-Hints web standard". ZDNET. Archived from the original on 2023-12-01. Retrieved 2024-06-02.
  4. Grigorik, I.; Weiss, Y. (February 2021). HTTP Client Hints. IETF. doi: 10.17487/RFC8942 . RFC 8942 . Retrieved February 11, 2021.
  5. "HTTP Client hints". HTTP. MDN. 2024-03-05. Archived from the original on 2024-06-07. Retrieved 2024-06-02.
  6. Taylor, Mike; Weiss, Yoav, eds. (1 April 2024). "User-Agent Client Hints § 6.2. GREASE-like UA Brand Lists". WICG . Archived from the original on 18 June 2024. Retrieved 26 June 2024.
  7. "Improving user privacy and developer experience with User-Agent Client Hints". Privacy & Security. Chrome for Developers. Archived from the original on 2024-06-02. Retrieved 2024-06-02.
  8. 1 2 "Brave's Concerns with the Client-Hints Proposal". Brave. 2019-05-09. Archived from the original on 2024-06-26. Retrieved 2024-06-02.