History sniffing

Last updated

History sniffing is a class of web vulnerabilities and attacks that allow a website to track a user's web browsing history activities by recording which websites a user has visited and which the user has not. This is done by leveraging long-standing information leakage issues inherent to the design of the web platform, one of the most well-known of which includes detecting CSS attribute changes in links that the user has already visited.

Contents

Despite being known about since 2002, history sniffing is still considered an unsolved problem. In 2010, researchers revealed that multiple high-profile websites had used history sniffing to identify and track users. Shortly afterwards, Mozilla and all other major web browsers implemented defences against history sniffing. However, recent research has shown that these mitigations are ineffective against specific variants of the attack and history sniffing can still occur via visited links and newer browser features.

Background

Early browsers such as Mosaic and Netscape Navigator were built on the model of the web being a set of statically linked documents known as pages. In this model, it made sense for the user to know which documents they had previously visited and which they hadn't, regardless of which document was referring to them. [1] Mosaic, one of the earliest graphical web browsers, used purple links to show that a page had been visited and blue links to show pages that had not been visited. [2] [3] This paradigm stuck around and was subsequently adopted by all modern web browsers. [4]

Over the years, the web evolved from its original model of static content towards more dynamic content. In 1995, employees at Netscape added a scripting language, Javascript, to its flagship web browser, Netscape Navigator. This addition allowed users to add interactivity to the web page via executing Javascript programs as part of the rendering process. [5] [6] However, this addition came with a new security problem, that of these Javascript programs being able to access each other's execution context and sensitive information about the user. As a result, shortly afterwards, Netscape Navigator introduced the same-origin policy. This security measure prevented Javascript from being able to arbitrarily access data in a different web page's execution context. [7] However, while the same-origin policy was subsequently extended to cover a large variety of features introduced before its existence, it was never extended to cover hyperlinks since it was perceived to hurt the user's ability to browse the web. [4] This innocuous omission would manifest into one of the well known and earliest forms of history sniffing known on the web. [8]

History

By extracting the colour of certain links, a website can access personally identifiable information. In this example, the website could infer that the user might be interested in leukemia, a form of blood cancer. Visited links vs unvisited links color difference.png
By extracting the colour of certain links, a website can access personally identifiable information. In this example, the website could infer that the user might be interested in leukemia, a form of blood cancer.

One of the first publicly disclosed reports of a history sniffing exploit was made by Andrew Clover from Purdue University in a mailing list post on BUGTRAQ in 2002. The post detailed how a malicious website could use Javascript to determine if a given link was of a specific colour, thus revealing if the link had been previously visited. [9] While this was initially thought of to be a theoretical exploit with little real-world value, later research by Jang et al. in 2010 revealed that high-profile websites were using this technique in the wild to reveal user browsing data. [10] As a result multiple lawsuits were filed against the websites that were found to have used history sniffing alleging a violation of the Computer Fraud and Abuse Act of 1986. [8]

In the same year, L. David Baron from Mozilla Corporation developed a defence against the attack that all major browsers would later adopt. The defence included restrictions against what kinds of CSS attributes could be used to style visited links. The ability to add background images and CSS transitions to links was disallowed. Additionally, visited links would be treated identically to standard links, with Javascript application programming interfaces (APIs) that allow the website to query the color of specific elements returning the same attributes for a visited link as those for non-visited links. This ensured malicious websites could not simply infer a person's browsing history by querying the colour changes. [11]

In 2011, research by then-Stanford graduate student Jonathan Mayer found that advertising company Epic Marketplace Inc. had used history sniffing to collect information about the browsing history of users across the web. [12] [13] A subsequent investigation by the Federal Trade Commission (FTC) revealed that Epic Marketplace had used history sniffing code as a part of advertisements in over 24,000 web domains, including ESPN and Papa Johns. The Javascript code allowed Epic Marketplace to track if a user has visited any of over 54,000 domains. [14] [15] The resulting data was subsequently used by Epic Marketplace to categorize users into specific groups and serve advertisements based on the websites the user had visited. As a result of this investigation, the FTC banned Epic Marketplace Inc. from conducting any form of online advertising and marketing for twenty years and was ordered to permanently delete the data it had collected. [16] [15]

Threat model

The threat model of history sniffing relies on the adversary being able to direct the victim to a malicious website entirely or partially under the adversary's control. The adversary can accomplish this by compromising a previously good web page, by phishing the user to a web page allowing the adversary to load arbitrary code, or by using a malicious advertisement on an otherwise safe web page. [8] [17] While most history sniffing attacks do not require user interactions, specific variants of the attacks need users to interact with particular elements which can often be disguised as buttons, browser games, CAPTCHAs, and other such elements. [4]

Modern variants

Despite being partially mitigated in 2010, history sniffing is still considered an unsolved problem. [8] In 2011, researchers at Carnegie Mellon University showed that while the defences proposed by Mozilla were sufficient to prevent most non-interactive attacks, such as those found by Jang et al., they were ineffective against interactive attacks. By showing users overlaid letters, numbers and patterns, which would only reveal themselves if a user had visited a specific website, the researchers were able to trick 307 participants into potentially revealing their browsing history via history sniffing. This was done by presenting the activities in the form of pattern solving problems, chess games and CAPTCHAs. [18] [4]

In 2018, researchers at the University of California, San Diego demonstrated timing attacks that could bypass the mitigations introduced by Mozilla. By abusing the CSS paint API (which allows developers to draw a background image programmatically) and targeting the bytecode cache of the browser, the researchers were able to time the amount of time it took to paint specific links. Thus, they were able to provide probabilistic techniques for identifying visited websites. [19] [20]

Since 2019, multiple history sniffing attacks have been found targeting various newer features browsers provide. In 2020, Sanchez-Rola et al. demonstrated that by measuring the time a server takes to respond to a request with HTTP cookies and then comparing it to how long it took for a server to respond without cookies, a website could perform history sniffing. [21] In 2023, Ali et al. demonstrated that newly introduced browser features could be abused also to perform history sniffing. One particularly notable example highlighted was the fact that a recently introduced feature, the Private Tokens API, introduced under Google's Privacy Sandbox initiative with an intention to prevent user tracking, could allow malicious actors to exfiltrate users browsing data by using techniques similar to those used for cross-site leak attacks. [22]

Related Research Articles

<span class="mw-page-title-main">HTTPS</span> Extension of the HTTP communications protocol to support TLS encryption

Hypertext Transfer Protocol Secure (HTTPS) is an extension of the Hypertext Transfer Protocol (HTTP). It uses encryption for secure communication over a computer network, and is widely used on the Internet. In HTTPS, the communication protocol is encrypted using Transport Layer Security (TLS) or, formerly, Secure Sockets Layer (SSL). The protocol is therefore also referred to as HTTP over TLS, or HTTP over SSL.

<span class="mw-page-title-main">Internet Explorer</span> Web browser series by Microsoft

Internet Explorer is a retired series of graphical web browsers developed by Microsoft that were used in the Windows line of operating systems. While IE has been discontinued on most Windows editions, it remains supported on certain editions of Windows, such as Windows 10 LTSB/LTSC. Starting in 1995, it was first released as part of the add-on package Plus! for Windows 95 that year. Later versions were available as free downloads or in-service packs and included in the original equipment manufacturer (OEM) service releases of Windows 95 and later versions of Windows. Microsoft spent over US$100 million per year on Internet Explorer in the late 1990s, with over 1,000 people involved in the project by 1999. New feature development for the browser was discontinued in 2016 and ended support on June 15, 2022 for Windows 10 Semi-Annual Channel (SAC), in favor of its successor, Microsoft Edge.

<span class="mw-page-title-main">Netscape Navigator</span> Web browser by Netscape released in 1994

Netscape Navigator is a discontinued proprietary web browser, and the original browser of the Netscape line, from versions 1 to 4.08, and 9.x. It was the flagship product of the Netscape Communications Corp and was the dominant web browser in terms of usage share in the 1990s, but by around 2003 its user base had all but disappeared. This was partly because the Netscape Corporation did not sustain Netscape Navigator's technical innovation in the late 1990s.

Netscape Communications Corporation was an American independent computer services company with headquarters in Mountain View, California, and then Dulles, Virginia. Its Netscape web browser was once dominant but lost to Internet Explorer and other competitors in the so-called first browser war, with its market share falling from more than 90 percent in the mid-1990s to less than one percent in 2006. An early Netscape employee Brendan Eich created the JavaScript programming language, the most widely used language for client-side scripting of web pages and a founding engineer of Netscape Lou Montulli created HTTP cookies. The company also developed SSL which was used for securing online communications before its successor TLS took over.

<span class="mw-page-title-main">Web browser</span> Software used to navigate the internet

A web browser is an application for accessing websites. When a user requests a web page from a particular website, the browser retrieves its files from a web server and then displays the page on the user's screen. Browsers are used on a range of devices, including desktops, laptops, tablets, and smartphones. In 2020, an estimated 4.9 billion people have used a browser. The most used browser is Google Chrome, with a 64% global market share on all devices, followed by Safari with 19%.

<span class="mw-page-title-main">Netscape Communicator</span> Discontinued Internet software suite

Netscape Communicator is a discontinued Internet suite produced by Netscape Communications Corporation, and was the fourth major release in the Netscape line of browsers. It was first in beta in 1996 and was released in June 1997. Netscape Communicator addressed the problem of Netscape Navigator 3.x being used as both the name of the suite and the browser contained within it by renaming the suite to Netscape Communicator. It included more groupware features intended to appeal to enterprises.

<span class="mw-page-title-main">Browser wars</span> Competition between web browsing applications for share of worldwide usage

A browser war is a competition for dominance in the usage share of web browsers. The "first browser war," (1995–2001) pitted Microsoft's Internet Explorer against Netscape's Navigator. Browser wars continued with the decline of Internet Explorer's market share and the popularity of other browsers, including Firefox, Google Chrome, Safari, Microsoft Edge and Opera.

In computing, the User-Agent header is an HTTP header intended to identify the user agent responsible for making a given HTTP request. Whereas the character sequence User-Agent comprises the name of the header itself, the header value that a given user agent uses to identify itself is colloquially known as its user agent string. The user agent for the operator of a computer used to access the Web has encoded within the rules that govern its behavior the knowledge of how to negotiate its half of a request-response transaction; the user agent thus plays the role of the client in a client–server system. Often considered useful in networks is the ability to identify and distinguish the software facilitating a network session. For this reason, the User-Agent HTTP header exists to identify the client software to the responding server.

This is a comparison of both historical and current web browsers based on developer, engine, platform(s), releases, license, and cost.

The blink element is a non-standard HTML element that indicates to a user agent that the page author intends the content of the element to blink. The element was introduced in Netscape Navigator but is no longer supported and often ignored by modern Web browsers; some, such as Internet Explorer, never supported the element at all.

Mozilla Firefox has features which distinguish it from other web browsers, such as Google Chrome, Safari, and Microsoft Edge.

<span class="mw-page-title-main">Same-origin policy</span> Security measure for client-side scripting

In computing, the same-origin policy (SOP) is a concept in the web application security model. Under the policy, a web browser permits scripts contained in a first web page to access data in a second web page, but only if both web pages have the same origin. An origin is defined as a combination of URI scheme, host name, and port number. This policy prevents a malicious script on one page from obtaining access to sensitive data on another web page through that page's Document Object Model (DOM).

<span class="mw-page-title-main">Mozilla Application Suite</span> Discontinued Internet suite

The Mozilla Application Suite is a discontinued cross-platform integrated Internet suite. Its development was initiated by Netscape Communications Corporation, before their acquisition by AOL. It was based on the source code of Netscape Communicator. The development was spearheaded by the Mozilla Organization from 1998 to 2003, and by the Mozilla Foundation from 2003 to 2006.

A proxy auto-config (PAC) file defines how web browsers and other user agents can automatically choose the appropriate proxy server for fetching a given URL.

<span class="mw-page-title-main">History of the World Wide Web</span> Information system running in the Internet

The World Wide Web is a global information medium that users can access via computers connected to the Internet. The term is often mistakenly used as a synonym for the Internet, but the Web is a service that operates over the Internet, just as email and Usenet do. The history of the Internet and the history of hypertext date back significantly further than that of the World Wide Web.

<span class="mw-page-title-main">HTTP cookie</span> Small pieces of data stored by a web browser while on a website

HTTP cookies are small blocks of data created by a web server while a user is browsing a website and placed on the user's computer or other device by the user's web browser. Cookies are placed on the device used to access a website, and more than one cookie may be placed on a user's device during a session.

In the context of the World Wide Web, a bookmark is a Uniform Resource Identifier (URI) that is stored for later retrieval in any of various storage formats. All modern web browsers include bookmark features. Bookmarks are called favorites or Internet shortcuts in Internet Explorer and Microsoft Edge, and by virtue of that browser's large market share, these terms have been synonymous with bookmark since the First Browser War. Bookmarks are normally accessed through a menu in the user's web browser, and folders are commonly used for organization. In addition to bookmarking methods within most browsers, many external applications offer bookmarks management.

Content Security Policy (CSP) is a computer security standard introduced to prevent cross-site scripting (XSS), clickjacking and other code injection attacks resulting from execution of malicious content in the trusted web page context. It is a Candidate Recommendation of the W3C working group on Web Application Security, widely supported by modern web browsers. CSP provides a standard method for website owners to declare approved origins of content that browsers should be allowed to load on that website—covered types are JavaScript, CSS, HTML frames, web workers, fonts, images, embeddable objects such as Java applets, ActiveX, audio and video files, and other HTML5 features.

Mozilla is a free software community founded in 1998 by members of Netscape. The Mozilla community uses, develops, publishes and supports Mozilla products, thereby promoting exclusively free software and open standards, with only minor exceptions. The community is supported institutionally by the non-profit Mozilla Foundation and its tax-paying subsidiary, the Mozilla Corporation.

<span class="mw-page-title-main">Jonathan Mayer</span> American computer scientist and lawyer

Jonathan Mayer is an American computer scientist and lawyer. He is an Assistant Professor of Computer Science and Public Affairs at Princeton University affiliated with the Center for Information Technology Policy, and was previously a PhD student in computer science at Stanford University and a fellow at the Center for Internet and Society and the Center for International Security and Cooperation. During his graduate studies he was a consultant at the California Department of Justice.

References

  1. "WorldWideWeb: Proposal for a HyperText Project". www.w3.org. Archived from the original on 29 June 2023. Retrieved 15 November 2023.
  2. "Why are hyperlinks blue? | The Mozilla Blog". blog.mozilla.org. Archived from the original on 15 November 2023. Retrieved 15 November 2023.
  3. "EMail Msg". ksi.cpsc.ucalgary.ca. Archived from the original on 15 November 2023. Retrieved 15 November 2023.
  4. 1 2 3 4 Weinberg, Zachary; Chen, Eric Y.; Jayaraman, Pavithra Ramesh; Jackson, Collin (2011). "I Still Know What You Visited Last Summer: Leaking Browsing History via User Interaction and Side Channel Attacks". 2011 IEEE Symposium on Security and Privacy. IEEE. pp. 147–161. doi:10.1109/SP.2011.23. ISBN   978-1-4577-0147-4. S2CID   10662023. Archived from the original on 24 December 2022. Retrieved 30 October 2023.
  5. "JavaScript 1.0 – 1995". www.webdesignmuseum.org. Archived from the original on 7 August 2020. Retrieved 19 January 2020.
  6. "Welcome to Netscape Navigator Version 2.0". netscape.com. 14 June 1997. Archived from the original on 14 June 1997. Retrieved 16 February 2020.
  7. "Netscape 3.0 Handbook – Advanced topics". netscape.com. Archived from the original on 8 August 2002. Retrieved 16 February 2020. Navigator version 2.02 and later automatically prevents scripts on one server from accessing properties of documents on a different server.
  8. 1 2 3 4 Van Goethem, Tom; Joosen, Wouter; Nikiforakis, Nick (12 October 2015). "The Clock is Still Ticking: Timing Attacks in the Modern Web". Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security. CCS '15. New York, NY, USA: Association for Computing Machinery. pp. 1382–1393. doi:10.1145/2810103.2813632. ISBN   978-1-4503-3832-5. S2CID   17705638.
  9. "Bugtraq: CSS visited pages disclosure". seclists.org. Archived from the original on 16 November 2023. Retrieved 16 November 2023.
  10. Jang, Dongseok; Jhala, Ranjit; Lerner, Sorin; Shacham, Hovav (4 October 2010). "An empirical study of privacy-violating information flows in JavaScript web applications". Proceedings of the 17th ACM conference on Computer and communications security. CCS '10. New York, NY, USA: Association for Computing Machinery. pp. 270–283. doi:10.1145/1866307.1866339. ISBN   978-1-4503-0245-6. S2CID   10901628.
  11. "privacy-related changes coming to CSS:visited – Mozilla Hacks – the Web developer blog". Mozilla Hacks – the Web developer blog. Archived from the original on 7 June 2023. Retrieved 16 November 2023.
  12. "Tracking the Trackers: To Catch a History Thief". cyberlaw.stanford.edu. Archived from the original on 16 November 2023. Retrieved 16 November 2023.
  13. Goodin, Dan. "Marketer taps browser flaw to see if you're pregnant". www.theregister.com. Archived from the original on 16 November 2023. Retrieved 16 November 2023.
  14. "FTC Final Order Prohibits Epic Marketplace From "History Sniffing"". JD Supra. Archived from the original on 16 November 2023. Retrieved 16 November 2023.
  15. 1 2 "FTC Settlement Puts an End to "History Sniffing" by Online Advertising Network Charged With Deceptively Gathering Data on Consumers". Federal Trade Commission. 5 December 2012. Archived from the original on 16 November 2023. Retrieved 16 November 2023.
  16. Gross, Grant (5 December 2012). "US FTC bars advertising firm from sniffing browser histories". Computerworld. Archived from the original on 16 November 2023. Retrieved 16 November 2023.
  17. Sanchez-Rola, Iskander; Balzarotti, Davide; Santos, Igor (22 December 2020). "Cookies from the Past: Timing Server-side Request Processing Code for History Sniffing". Digital Threats: Research and Practice. 1 (4): 24:1–24:24. doi: 10.1145/3419473 .
  18. Kikuchi, Hiroaki; Sasa, Kota; Shimizu, Yuta (2016). "Interactive History Sniffing Attack with Amida Lottery". 2016 10th International Conference on Innovative Mobile and Internet Services in Ubiquitous Computing (IMIS). IEEE. pp. 599–602. doi:10.1109/IMIS.2016.109. ISBN   978-1-5090-0984-8. S2CID   32216851. Archived from the original on 6 June 2018. Retrieved 30 October 2023.
  19. Haskins, Caroline (2 November 2018). "Old School 'Sniffing' Attacks Can Still Reveal Your Browsing History". Vice. Retrieved 30 October 2023.
  20. Smith, Michael; Disselkoen, Craig; Narayan, Shravan; Brown, Fraser; Stefan, Deian (2018). "Browser history {re:visited}". Offensive Technologies. Usenix Workshop. 12th 2018. (Woot'18). S2CID   51939166.
  21. Sanchez-Rola, Iskander; Balzarotti, Davide; Santos, Igor (22 December 2020). "Cookies from the Past: Timing Server-side Request Processing Code for History Sniffing". Digital Threats: Research and Practice. 1 (4): 24:1–24:24. doi: 10.1145/3419473 . S2CID   229716038.
  22. Ali, Mir Masood; Chitale, Binoy; Ghasemisharif, Mohammad; Kanich, Chris; Nikiforakis, Nick; Polakis, Jason (2023). "Navigating Murky Waters: Automated Browser Feature Testing for Uncovering Tracking Vectors (ABTUTV)". Proceedings 2023 Network and Distributed System Security Symposium. Reston, VA: Internet Society. doi: 10.14722/ndss.2023.24072 . ISBN   978-1-891562-83-9. S2CID   257502501.