International standard |
|
---|---|
Developed by | Google, W3C |
Website | https://wicg.github.io/ua-client-hints/ |
Client Hints are an extension to the existing Hypertext Transfer Protocol (HTTP) that allows web servers to ask the client (which is usually a web browser) for information about its configuration. The client can choose to respond to this request by advertising the requested information about itself through sending the data using a specific part of the HTTP protocol called HTTP Header fields or by exposing the same information to the JavaScript code being executed on a web page. This can then help the server tailor its responses to the client; for example, a server can choose to send a smaller image if a client advertises that they have a very small screen.
Proposed by Google engineers in 2013, Client Hints were designed as a privacy-focused alternative to user-agent headers. This was done as part of an initiative by Google called Privacy Sandbox. User-agent headers are text sent by a client to a server to identify the client. While initially intended for statistical purposes, these headers had increasingly became a tool for tracking users across websites. Client Hints aimed to address this issue by providing a more controlled way to share the same information. Despite the focus on privacy, the initial design of Client Hints faced criticism from other browsers. One of the primary concerns that was brought up was that the protocol could enable new forms of tracking by third-party domains. Third-party domains are web servers not owned by the website that load resources like images and script files. Despite these concerns, Chrome implemented support for Client Hints in August 2020. By May 2024, over 75% of web users used browsers that supported Client Hints.
Privacy researchers have since raised concerns that Client Hints are primarily being used by JavaScript code that was being used to track user. In 2023, a study from the from KU Leuven and Radboud University found that amongst the top 100,000 websites on the internet, most accesses of Client Hints came from JavaScript code used for tracking and advertising purposes.
In 1992, an extension to the HTTP protocol was introduced adding a User-Agent
HTTP Header which was sent from the client to the server and contained a simple string identifying the name of the client and its version. The header was meant purely for statistical purposes and for tracking down clients that violated the protocol. Since then, User-Agent headers have become increasingly more complex, and has started containing significant uniquely identifiable information about the user. Often, this information is used to perform browser fingerprinting, allowing sites to track users across sites passively without having to load any JavaScript for the user. [1]
The original draft for the Client Hint specification was proposed in 2013 by engineers at Google. The specifications became an Internet Engineering Task Force (IETF) draft in November 2015. Subsequently, in 2021, the specification was upgraded to the status of an experimental request for comment (RFC). [2] This designation indicated that the IETF had accepted the Client Hints specification as an internet standard, but it either still had unresolved questions or had not yet gained widespread adoption in the internet. [3] Around the same time, the specifications for how web browser would be handling HTTP Client Hints on the web was published as a draft in a W3C Community Group Report. [2]
In 2020, Google announced their intention to deprecate user-agent (UA) declaration by the browser. [4] This deprecation was part of a broader initiative by Google to make changes to the web that allow websites to access user information without compromising privacy called Privacy Sandbox. They cited Client Hints as a privacy-preserving alternative to user-agent headers since they allowed for a more controlled way of sharing the same information. [1] The initial Client Hints proposal, however, was met with pushback from other browsers due to privacy concerns. In 2019, Brave raised concerns about the initial proposal, citing ways in which it could be used to track users on the internet. [5] Mozilla, the company that makes Firefox, initially classified the proposal as harmful, and Apple, the company that makes Safari also took a negative stance against the proposal. [1] Despite these concerns, Chrome implemented support for HTTP Client Hints in August 2020. While the deprecation of the UA strings was delayed due to the COVID-19 pandemic, this process was completed in February 2023. [1]
Since their initial opposition, Mozilla has updated their stance to neutral and Brave has synchronized its implementation of Client Hints with that of Chrome. [1] As of May 2024, over 75% of all web users use browsers that support Client Hints. [2]
The Client Hints protocol defines two entities: a user agent (UA) (typically a browser) and a server. These two entities communicate with each other to negotiate what kind of content should be served to the user. [6] The process involves the server sending the UA a response with an Accept-CH
HTTP Header, containing a list of Client Hint HTTP headers that it requires. Subsequently, the UA is expected to return the requested client hints with each subsequent response, provided it supports those hints. These headers are then used by the server to make decisions on what kind of content to serve the UA. [2] If the UA does not understand or support a particular client hint then the UA is instructed to ignore the particular client hint. In cases where a specific Client Hint cannot be cached, the server must specify the applicable client hints headers in a separate Vary
header sent to the UA. [1] This ensures that caching mechanisms understand that responses can vary based on different client hint values. [7] For client hints that specifically identify a browser, additional random browser identifiers are included as grease in order to prevent users of the protocol from relying on browser specific idiosyncratic behaviours. [8]
For UAs that allow JavaScript, an additional option is available through the navigator.userAgentData
JavaScript API. This API enables JavaScript to retrieve the same information as provided by the Client Hints headers. [1] The API separates the data it provides into two types: low-entropy data and high-entropy data. Low-entropy data corresponds to information that is likely to be similar across a large group of users, such as the platform on which the browser is running and the brand of the browser. In contrast, high-entropy data may vary significantly between users, including details like the exact version number of the browser and the model of the user's device. Low entropy data is included in the API as object parameters whereas high entropy data which can uniquely identify the user needs to be explicitly fetched by the client by calling the getHighEntropyValues()
function in the API which allows the browser to ask for user permission or to perform additional checks. [9]
To initiate a content negotiation, a HTTP server appends the Accept-CH
header to the response of a HTTP request:
HTTP/1.1200OK... Accept-CH: Viewport-Width ...
If the user-agent supports the view-port width client hint, the user-agent will append the Viewport-Width
header in every subsequent request,
GET/galleryHTTP/1.1... Viewport-Width: 1920 ...
the server can then use the information in the Viewport-Width
header to make a decision about the kind of content to serve the client. For example, if the server has a particular image that is extremely large, the server can be configured to return smaller image if the image does not fit the viewport. [10]
When the Client Hints proposal was originally published, it was met with significant privacy concerns. Browser vendors like Brave and Mozilla pointed out that a particular provision in the initial draft of the proposal allowed websites to instruct the browser to provide Client Hint data to third-party domains. Third-party domains are domains that do not execute any JavaScript code, but rather load resources like images and script files. [5] The provision in the initial draft would allow these third-party domains like content delivery networks (CDNs), which distribute website content across a network of geographically dispersed group of servers to improve the speed and reliability of the website and cloud service providers like Cloudflare and Google Cloud that offer services like data storage, computing power, and infrastructure for websites and applications to track users across the web by instructing the browser to send Client Hint information to their servers. [5] [11] Additionally, concerns were also raised that the Client-Hint proposal was too permissive and explicitly allowed for new privacy compromising information that could not be obtained by simply reading HTTP Headers to be leaked to servers. [11] Additionally extensions that aim to preserve a user's privacy like the NoScript extension also opposed the proposal on the grounds that it would make it significantly harder to prevent sites from exfiltrating privacy-compromising information about users. [5]
Since the adoption of Client Hints by major browsers like Google Chrome and Microsoft Edge, privacy researchers have raised concerns over their real-world use for tracking. [2] A 2023 study by researchers from KU Leuven and Radboud University found that out of the top 100,000 websites, 60% of JavaScript files loaded by web pages accessed the Client Hints JavaScript APIs, with most being tracking and advertising scripts, many of which came from Google. Over 90% of these script files exfiltrated the obtained data to tracking domains. [1] A subsequent study in May 2024 by researchers from the Hochschule Bonn-Rhein-Sieg University of Applied Sciences noted that while overall adoption of Client Hints amongst websites on the internet was low, a significant number of third-party domains known for tracking accessed HTTP Client Hints data. [2]
A web browser is an application for accessing websites. When a user requests a web page from a particular website, the browser retrieves its files from a web server and then displays the page on the user's screen. Browsers are used on a range of devices, including desktops, laptops, tablets, and smartphones. By 2020, an estimated 4.9 billion people had used a browser. The most-used browser is Google Chrome, with a 67% global market share on all devices, followed by Safari with 18%.
In computing, the User-Agent header is an HTTP header intended to identify the user agent responsible for making a given HTTP request. Whereas the character sequence User-Agent
comprises the name of the header itself, the header value that a given user agent uses to identify itself is colloquially known as its user agent string. The user agent for the operator of a computer used to access the Web has encoded within the rules that govern its behavior the knowledge of how to negotiate its half of a request-response transaction; the user agent thus plays the role of the client in a client–server system. Often considered useful in networks is the ability to identify and distinguish the software facilitating a network session. For this reason, the User-Agent HTTP header exists to identify the client software to the responding server.
In the context of an HTTP transaction, basic access authentication is a method for an HTTP user agent to provide a user name and password when making a request. In basic HTTP authentication, a request contains a header field in the form of Authorization: Basic <credentials>
, where <credentials>
is the Base64 encoding of ID and password joined by a single colon :
.
Push technology, also known as server Push, refers to a communication method, where the communication is initiated by a server rather than a client. This approach is different from the "pull" method where the communication is initiated by a client.
Google Analytics is a web analytics service offered by Google that tracks and reports website traffic and also mobile app traffic and events, currently as a platform inside the Google Marketing Platform brand. Google launched the service in November 2005 after acquiring Urchin.
HTTP cookies are small blocks of data created by a web server while a user is browsing a website and placed on the user's computer or other device by the user's web browser. Cookies are placed on the device used to access a website, and more than one cookie may be placed on a user's device during a session.
In HTTP, "Referer" is an optional HTTP header field that identifies the address of the web page from which the resource has been requested. By checking the referrer, the server providing the new web page can see where the request originated.
A device fingerprint or machine fingerprint is information collected about the software and hardware of a remote computing device for the purpose of identification. The information is usually assimilated into a brief identifier using a fingerprinting algorithm. A browser fingerprint is information collected specifically by interaction with the web browser of the device.
A single-page application (SPA) is a web application or website that interacts with the user by dynamically rewriting the current web page with new data from the web server, instead of the default method of loading entire new pages. The goal is faster transitions that make the website feel more like a native app.
Web storage, sometimes known as DOM storage, is a standard JavaScript API provided by web browsers. It enables websites to store persistent data on users' devices similar to cookies, but with much larger capacity and no information sent in HTTP headers. There are two main web storage types: local storage and session storage, behaving similarly to persistent cookies and session cookies respectively. Web Storage is standardized by the World Wide Web Consortium (W3C) and WHATWG, and is supported by all major browsers.
The W3C Geolocation API is an effort by the World Wide Web Consortium (W3C) to standardize an interface to retrieve the geographical location information for a client-side device. It defines a set of objects, ECMAScript standard compliant, that executing in the client application give the client's device location through the consulting of Location Information Servers, which are transparent for the application programming interface (API). The most common sources of location information are IP address, Wi-Fi and Bluetooth MAC address, radio-frequency identification (RFID), Wi-Fi connection location, or device Global Positioning System (GPS) and GSM/CDMA cell IDs. The location is returned with a given accuracy depending on the best location information source available.
Google Chrome Frame was a plug-in designed for Internet Explorer based on the open-source Chromium project, first announced on September 22, 2009. It went stable in September 2010, on the first birthday of the project. It was discontinued on February 25, 2014 and is no longer supported.
WebSocket is a computer communications protocol, providing a simultaneous two-way communication channel over a single Transmission Control Protocol (TCP) connection. The WebSocket protocol was standardized by the IETF as RFC 6455 in 2011. The current specification allowing web applications to use this protocol is known as WebSockets. It is a living standard maintained by the WHATWG and a successor to The WebSocket API from the W3C.
Content Security Policy (CSP) is a computer security standard introduced to prevent cross-site scripting (XSS), clickjacking and other code injection attacks resulting from execution of malicious content in the trusted web page context. It is a Candidate Recommendation of the W3C working group on Web Application Security, widely supported by modern web browsers. CSP provides a standard method for website owners to declare approved origins of content that browsers should be allowed to load on that website—covered types are JavaScript, CSS, HTML frames, web workers, fonts, images, embeddable objects such as Java applets, ActiveX, audio and video files, and other HTML5 features.
HTML audio is a subject of the HTML specification, incorporating audio input, playback, and synthesis, as well as speech to text, all in the browser.
A web beacon is a technique used on web pages and email to unobtrusively allow checking that a user has accessed some content. Web beacons are typically used by third parties to monitor the activity of users at a website for the purpose of web analytics or page tagging. They can also be used for email tracking. When implemented using JavaScript, they may be called JavaScript tags. Web beacons are unseen HTML elements that track a webpage views. Upon the user revisiting the webpage, these beacons are connected to cookies established by the server, facilitating undisclosed user tracking.
Third-party cookies are HTTP cookies which are used principally for web tracking as part of the web advertising ecosystem.
Federated Learning of Cohorts (FLoC) is a type of web tracking. It groups people into "cohorts" based on their browsing history for the purpose of interest-based advertising. FLoC was being developed as a part of Google's Privacy Sandbox initiative, which includes several other advertising-related technologies with bird-themed names. Despite "federated learning" in the name, FLoC does not utilize any federated learning.
The Privacy Sandbox is an initiative led by Google to create web standards for websites to access user information without compromising privacy. Its core purpose is to facilitate online advertising by sharing a subset of user private information without the use of third-party cookies. The initiative includes a number of proposals, many of these proposals have bird-themed names which are changed once the corresponding feature reaches general availability. The technology include Topics API, Protected Audience, Attribution Reporting, Private Aggregation, Shared Storage and Fenced Frames as well as other proposed technologies. The project was announced in August 2019.