Evercookie

Last updated
'Tor Stinks' NSA presentation Tor Stinks.pdf
'Tor Stinks' NSA presentation

Evercookie (also known as supercookie [1] ) is a JavaScript application programming interface (API) that identifies and reproduces intentionally deleted cookies on the clients' browser storage. [2] It was created by Samy Kamkar in 2010 to demonstrate the possible infiltration from the websites that use respawning. [3] Websites that have adopted this mechanism can identify users even if they attempt to delete the previously stored cookies. [4]

Contents

In 2013, Edward Snowden leaked a top-secret NSA document that showed Evercookie can track Tor (anonymity networks) users. [5] Many popular companies use functionality similar to Evercookie to collect user information and track users. [1] [6] Further research on fingerprinting and search engines also draws inspiration from Evercookie's ability to track a user persistently. [4] [5] [7]

Background

There are three commonly used data storages, including HTTP cookies, flash cookies, HTML5 Storage, and others. [1] [8] When the user visits a website for the first time, the web server may generate a unique identifier and store it on the user's browser or local space. [9] The website can read and identify the user in its future visits with the stored identifier, and the website can save user's preferences and display marketing advertisements. [9] Due to privacy concerns, all major browsers include mechanisms for deleting and/or refusing cookies from websites. [9] [10]

In response to the users' increased unwillingness to accept cookies, many websites employ methods to circumvent users' deletion of cookies. [11] Started from 2009, many research teams found popular websites used flash cookies, ETags, and various other data storage to rebuild the deleted cookies by users, including hulu.com, foxnews.com, spotify.com, etc. [1] [12] [13] [14] In 2010, Samy Kamkar, a Californian programmer, built an Evercookie project to further illustrate the tracking mechanism with respawning across various storage mechanisms on browsers. [3]

Description

Evercookie allows website authors to be able to identify users even after said users have attempted to delete cookies. [15] Samy Kamkar released v0.4 beta of the evercookie on September 13, 2010, as an open source project. [16] [17] [18] Evercookie is capable of respawning deleted HTTP cookies by storing the cookies on multiple different storage systems typically exposed by web browsers. [16] When a browser visits a website with the Evercookie API on its server, the web server can generate an identifier and store it on various storage mechanisms available on that browser. [2] If the user removes some but not all of the stored identifiers on the browser and revisits the website, the web server retrieves the identifier from storage areas that the user failed to delete. [16] Then the web server will copy and restore this identifier to the previously cleared storage areas. [19]

By abusing the various available storage mechanisms, Evercookie creates persistent data identifiers, because users are unlikely to clear all storing mechanisms. [20] From the list provided by Samy Kamkar, [16] 17 storage mechanisms could be used for the v0.4 beta Evercookie when they are available on browsers:

Samy Kamkar claims that he did not intend to use the Evercookie project to violate internet user privacy or to sell to any parties for commercial use. However, it has served as an inspiration for other commercial websites that later implemented similar mechanisms to restore user-deleted cookies.[ citation needed ] The Evercookie project is open source, meaning everyone can access and examine the code, or use the code for any purpose. The project incorporates HTML5 as one of the storage mechanisms, which was released 6 months before the project and gained public attentions due to its added persistency. Kamkar wished his project could demonstrate how users' privacy can be infiltrated by contemporary tracking tools. [21] In 2010, one way to prevent Evercookie respawning was a Firefox browser plug-in named "Anonymizer Nevercookie™". [22]

The storage mechanisms incorporated in the Evercookie project are constantly being updated, adding to Evercookie's persistency. As it incorporates many existing tracking methods, Evercookie provides an advanced data tracking tool that reduces the redundancy of data collection methods by many commercial websites. [23] [24] An increasing number of commercial websites used the idea of Evercookie, and added upon it by incorporating new storage vectors. In 2014, a research team at the Princeton University conducted a large scale study of three persistent tracking tools: Evercookie, canvas fingerprinting, and cookie syncing. The team crawled and analyzed the top 100,000 Alexa websites, and detected a new storage vector, IndexedDB, that is incorporated into an Evercookie mechanism and used by weibo.com. The team claimed this is the first detection of commercial use for IndexedDB. [12] Moreover, the researchers discovered cookie syncing is used in conjunction with Evercookie. Cookie syncing allows data sharing between different storage mechanisms, facilitating Evercookie's respawning process in different storage locations on users' browsers. The team also discovered instances of Flash cookies respawning HTTP cookies, and HTTP cookies respawning the flash cookies on the commercial websites. Those two mechanisms are different from the Evercookie project in terms of the number of storage mechanisms employed, but they follow the same principle. Among the sites that the research team crawled, 10 out of 200 websites used flash cookies to rebuild HTTP cookies. 9 of the observed sites belonged to China (including sina.com.cn, weibo.com, hao123.com, sohu.com, ifeng.com, youku.com, 56.com, letv.com, and tudo.com). The other website identified was yandex.ru, a top search engine in Russia.[ citation needed ]

Applications

A research team from the Slovak University of Technology proposed a mechanism for search engines to infer Internet users’ intended search words and produce personalized search results. Often the queries from Internet users contain multiple meanings and range across different fields. As a result, the displayed search results from the search engine contain a multitude of information, many of which are not related to the searcher. The authors proposed that searchers’ identity and user preference have a strong indication on the queries meaning and can greatly reduce the ambiguity of the search word. The research team built a metadata-based model to extract users’ information with evercookie, and they integrated this user interest model into the search engine to enhance personalization of the search result. The team was aware that traditional cookie can be easily deleted by experiment subjects thus lead to incomplete experiment data. The research team then utilized evercookie's persistency. [4]

Controversial applications

KISSMetrics privacy lawsuit

On Friday July 29, 2011, a research team at the University of California, Berkeley crawled the top 100 U.S. websites based upon QuantCast. The team found KISSmetrics, a third party website that provides marketing analytical tools, used HTTP cookies, Flash cookies, ETags, and some but not all storage mechanisms employed in Samy Kamkar's Evercookie project to respawn the user's deleted information. [1] Other popular websites, such as hulu.com and spotify.com, employed KISSmetrics to respawn HTML5 and HTTP first party cookies. The research team claimed this was the first time that Etag was observed to be used in commercial settings. [14]

On the same day of the report's publication, Hulu and Spotify announced their suspended use of KISSmetrics for further investigation. [25] Two consumers sued KISSmetrics over its violation of user privacy. [26] KISSMetrics revised its privacy policies during the weekend, indicating the company had fully respected customers' will if they chose not to be tracked. On August 4, 2011, KISSmetrics' CEO Hiten Shah denied KISSmetrics' implementation of Evercookie and other tracking mechanisms mentioned in the report, and he claimed the company only used legitimate first party cookie trackers. [1] On October 19, 2012, KISSmetrics agreed to pay over $500,000 to settle the accusation and promised to refrain from using Evercookie. [27] [28]

NSA Tor tracking

In 2013, an internal National Security Agency (NSA)'s presentation was revealed by Edward Snowden, suggesting Evercookie's use in government surveillance to track Tor users. [5] [29] The TOR Blog responded to this leaked document in one post, assuring that TOR Browser Bundles and Tails operating system provide strong protections against evercookie. [30] [31]

Public attitudes towards data tracking

Evercookie, and many other emerged new technologies in persistent data tracking, is a response to internet users' tendency of deleting cookie storage. In this system of information exchange, some consumers believe they are being compensated with greater personalization information, or sometimes even financial compensation from the related companies. [32] Recent related research, however, shows a gap between the expectations of the consumer and marketers. [33] A Wall Street Journal survey showed 72% felt offended when they saw targeted advertisements while browsing the internet. Another survey showed 66% of Americans felt negative about how marketers track their data to generate individualized information. In another survey, 52% of respondents said they would like to turn off behavioral advertising. [34] Data tracking persists, however. [35] [36]

See also

Related Research Articles

<span class="mw-page-title-main">Web browser</span> Software used to access websites

A web browser is an application for accessing websites. When a user requests a web page from a particular website, the browser retrieves its files from a web server and then displays the page on the user's screen. Browsers are used on a range of devices, including desktops, laptops, tablets, and smartphones. By 2020, an estimated 4.9 billion people had used a browser. The most-used browser is Google Chrome, with a 66% global market share on all devices, followed by Safari with 18%.

Internet privacy involves the right or mandate of personal privacy concerning the storage, re-purposing, provision to third parties, and display of information pertaining to oneself via the Internet. Internet privacy is a subset of data privacy. Privacy concerns have been articulated from the beginnings of large-scale computer sharing and especially relate to mass surveillance.

<span class="mw-page-title-main">HTTP cookie</span> Small pieces of data stored by a web browser while on a website

HTTP cookies are small blocks of data created by a web server while a user is browsing a website and placed on the user's computer or other device by the user's web browser. Cookies are placed on the device used to access a website, and more than one cookie may be placed on a user's device during a session.

A local shared object (LSO), commonly called a Flash cookie, is a piece of data that websites that use Adobe Flash may store on a user's computer. Local shared objects have been used by all versions of Flash Player since version 6.

<span class="mw-page-title-main">HTTP ETag</span> Communications protocol

The ETag or entity tag is part of HTTP, the protocol for the World Wide Web. It is one of several mechanisms that HTTP provides for Web cache validation, which allows a client to make conditional requests. This mechanism allows caches to be more efficient and saves bandwidth, as a Web server does not need to send a full response if the content has not changed. ETags can also be used for optimistic concurrency control to help prevent simultaneous updates of a resource from overwriting each other.

<span class="mw-page-title-main">Samy Kamkar</span> American businessman

Samy Kamkar is an American privacy and security researcher, computer hacker and entrepreneur. At the age of 16, he dropped out of high school. One year later, he co-founded Fonality, a unified communications company based on open-source software, which raised over $46 million in private funding. In 2005, he created and released the fastest spreading virus of all time, the MySpace worm Samy, and was subsequently raided by the United States Secret Service under the Patriot Act. He also created SkyJack, a custom drone which hacks into any nearby Parrot drones allowing them to be controlled by its operator and created the Evercookie, which appeared in a top-secret NSA document revealed by Edward Snowden and on the front page of The New York Times. He has also worked with The Wall Street Journal, and discovered the illicit mobile phone tracking where the Apple iPhone, Google Android and Microsoft Windows Phone mobile devices transmit GPS and Wi-Fi information to their parent companies. His mobile research led to a series of class-action lawsuits against the companies and a privacy hearing on Capitol Hill. Kamkar has a chapter giving advice in Tim Ferriss' book Tools of Titans.

A device fingerprint or machine fingerprint is information collected about the software and hardware of a remote computing device for the purpose of identification. The information is usually assimilated into a brief identifier using a fingerprinting algorithm. A browser fingerprint is information collected specifically by interaction with the web browser of the device.

<span class="mw-page-title-main">Private browsing</span> Privacy feature in some web browsers

Private browsing is a feature in some web browsers that enhances user privacy. In this mode, the browser initiates a temporary session separate from its main session and user data. The browsing history is not recorded, and local data related to the session, like Cookies and Web cache, are deleted once the session ends. The primary purpose of these modes is to ensure that data and history from a specific browsing session do not remain on the device or get accessed by another user of the same device. In web development, it can be used to quickly test displaying pages as they appear to first-time visitors.

Web storage, sometimes known as DOM storage, is a standard JavaScript API provided by web browsers. It enables websites to store persistent data on users' devices similar to cookies, but with much larger capacity and no information sent in HTTP headers. There are two main web storage types: local storage and session storage, behaving similarly to persistent cookies and session cookies respectively. Web Storage is standardized by the World Wide Web Consortium (W3C) and WHATWG, and is supported by all major browsers.

Web tracking is the practice by which operators of websites and third parties collect, store and share information about visitors' activities on the World Wide Web. Analysis of a user's behaviour may be used to provide content that enables the operator to infer their preferences and may be of interest to various parties, such as advertisers. Web tracking can be part of visitor management.

<span class="mw-page-title-main">Web browsing history</span> List of web pages a user has visited recently

Web browsing history refers to the list of web pages a user has visited, as well as associated metadata such as page title and time of visit. It is usually stored locally by web browsers in order to provide the user with a history list to go back to previously visited pages. It can reflect the user's interests, needs, and browsing habits.

A zombie cookie is a piece of data usually used for tracking users, which is created by a web server while a user is browsing a website, and placed on the user's computer or other device by the user's web browser, similar to regular HTTP cookies, but with mechanisms in place to prevent the deletion of the data by the user. Zombie cookies could be stored in multiple locations—since failure to remove all copies of the zombie cookie will make the removal reversible, zombie cookies can be difficult to remove. Since they do not entirely rely on normal cookie protocols, the visitor's web browser may continue to recreate deleted cookies even though the user has opted not to receive cookies.

Browser security is the application of Internet security to web browsers in order to protect networked data and computer systems from breaches of privacy or malware. Security exploits of browsers often use JavaScript, sometimes with cross-site scripting (XSS) with a secondary payload using Adobe Flash. Security exploits can also take advantage of vulnerabilities that are commonly exploited in all browsers.

<span class="mw-page-title-main">Ashkan Soltani</span> American computer scientist

Ashkan Soltani is the executive director of the California Privacy Protection Agency. He has previously been the Chief Technologist of the Federal Trade Commission and an independent privacy and security researcher based in Washington, DC.

<span class="mw-page-title-main">Chris Hoofnagle</span>

Chris Jay Hoofnagle is an American professor at the University of California, Berkeley who teaches information privacy law, computer crime law, regulation of online privacy, internet law, and seminars on new technology. Hoofnagle has contributed to the privacy literature by writing privacy law legal reviews and conducting research on the privacy preferences of Americans. Notably, his research demonstrates that most Americans prefer not to be targeted online for advertising and despite claims to the contrary, young people care about privacy and take actions to protect it. Hoofnagle has written scholarly articles regarding identity theft, consumer privacy, U.S. and European privacy laws, and privacy policy suggestions.

Canvas fingerprinting is one of a number of browser fingerprinting techniques for tracking online users that allow websites to identify and track visitors using the HTML5 canvas element instead of browser cookies or other similar means. The technique received wide media coverage in 2014 after researchers from Princeton University and KU Leuven University described it in their paper The Web never forgets.

Cross-device tracking is technology that enables the tracking of users across multiple devices such as smartphones, television sets, smart TVs, and personal computers.

Click tracking is when user click behavior or user navigational behavior is collected in order to derive insights and fingerprint users. Click behavior is commonly tracked using server logs which encompass click paths and clicked URLs. This log is often presented in a standard format including information like the hostname, date, and username. However, as technology develops, new software allows for in depth analysis of user click behavior using hypervideo tools. Given that the internet can be considered a risky environment, research strives to understand why users click certain links and not others. Research has also been conducted to explore the user experience of privacy with making user personal identification information individually anonymized and improving how data collection consent forms are written and structured.

Federated Learning of Cohorts (FLoC) is a type of web tracking. It groups people into "cohorts" based on their browsing history for the purpose of interest-based advertising. FLoC was being developed as a part of Google's Privacy Sandbox initiative, which includes several other advertising-related technologies with bird-themed names. Despite "federated learning" in the name, FLoC does not utilize any federated learning.

<span class="mw-page-title-main">Privacy Sandbox</span> Google initiative

The Privacy Sandbox is an initiative led by Google to create web standards for websites to access user information without compromising privacy. Its core purpose is to facilitate online advertising by sharing a subset of user private information without the use of third-party cookies. The initiative includes a number of proposals, many of these proposals have bird-themed names which are changed once the corresponding feature reaches general availability. The technology include Topics API, Protected Audience, Attribution Reporting, Private Aggregation, Shared Storage and Fenced Frames as well as other proposed technologies. The project was announced in August 2019.

References

  1. 1 2 3 4 5 6 Bujlow, Tomasz; Carela-Espanol, Valentin; Lee, Beom-Ryeol; Barlet-Ros, Pere (2017). "A Survey on Web Tracking: Mechanisms, Implications, and Defenses". Proceedings of the IEEE. 105 (8): 1476–1510. doi:10.1109/jproc.2016.2637878. hdl: 2117/108437 . ISSN   0018-9219. S2CID   2662250.
  2. 1 2 Acar, Gunes; Eubank, Christian; Englehardt, Steven; Juarez, Marc; Narayanan, Arvind; Diaz, Claudia (2014). "The Web Never Forgets". Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security. New York, New York, USA: ACM Press. pp. 674–689. doi:10.1145/2660267.2660347. ISBN   978-1-4503-2957-6. S2CID   8127620.
  3. 1 2 Bashir, Muhammad Ahmad; Wilson, Christo (2018-10-01). "Diffusion of User Tracking Data in the Online Advertising Ecosystem". Proceedings on Privacy Enhancing Technologies. 2018 (4): 85–103. doi: 10.1515/popets-2018-0033 . ISSN   2299-0984. S2CID   52088002.
  4. 1 2 3 Kramár, Tomáš; Barla, Michal; Bieliková, Mária (2013-02-01). "Personalizing search using socially enhanced interest model, built from the stream of user's activity". Journal of Web Engineering. 12 (1–2): 65–92. ISSN   1540-9589.
  5. 1 2 3 Kobusińska, Anna; Pawluczuk, Kamil; Brzeziński, Jerzy (2018). "Big Data fingerprinting information analytics for sustainability". Future Generation Computer Systems. 86: 1321–1337. doi:10.1016/j.future.2017.12.061. ISSN   0167-739X. S2CID   49646910.
  6. Koop, Martin; Tews, Erik; Katzenbeisser, Stefan (2020-10-01). "In-Depth Evaluation of Redirect Tracking and Link Usage". Proceedings on Privacy Enhancing Technologies. 2020 (4): 394–413. doi: 10.2478/popets-2020-0079 . ISSN   2299-0984.
  7. Al-Fannah, Nasser Mohammed; Mitchell, Chris (2020-01-07). "Too little too late: can we control browser fingerprinting?". Journal of Intellectual Capital. 21 (2): 165–180. doi:10.1108/jic-04-2019-0067. ISSN   1469-1930. S2CID   212957853.
  8. Zhiju, Yang; Chuan, Yue (2020-04-01). "A Comparative Measurement Study of Web Tracking on Mobile and Desktop Environments". Proceedings on Privacy Enhancing Technologies. Retrieved 2020-12-11.
  9. 1 2 3 Yue, Chuan; Xie, Mengjun; Wang, Haining (September 2010). "An automatic HTTP cookie management system". Computer Networks. 54 (13): 2182–2198. doi:10.1016/j.comnet.2010.03.006. ISSN   1389-1286.
  10. fouad, Imane; Bielova, Nataliia; Legout, Arnaud; Sarafijanovic-Djukic, Natasa (2020-04-01). "Missed by Filter Lists: Detecting Unknown Third-Party Trackers with Invisible Pixels". Proceedings on Privacy Enhancing Technologies. 2020 (2): 499–518. arXiv: 1812.01514 . doi: 10.2478/popets-2020-0038 . ISSN   2299-0984.
  11. Cook, John; Nithyanand, Rishab; Shafiq, Zubair (2020-01-01). "Inferring Tracker-Advertiser Relationships in the Online Advertising Ecosystem using Header Bidding". Proceedings on Privacy Enhancing Technologies. 2020 (1): 65–82. arXiv: 1907.07275 . doi: 10.2478/popets-2020-0005 . ISSN   2299-0984.
  12. 1 2 Acar, Gunes; Eubank, Christian; Englehardt, Steven; Juarez, Marc; Narayanan, Arvind; Diaz, Claudia (2014). "The Web Never Forgets". Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security. Scottsdale, Arizona, USA: ACM Press. pp. 674–689. doi:10.1145/2660267.2660347. ISBN   978-1-4503-2957-6. S2CID   8127620.
  13. Soltani, Ashkan; Canty, Shannon; Mayo, Quentin; Thomas, Lauren; Hoofnagle, Chris Jay (2009-08-10). "Flash Cookies and Privacy". Rochester, NY. doi:10.2139/ssrn.1446862. S2CID   6414306. SSRN   1446862.{{cite journal}}: Cite journal requires |journal= (help)
  14. 1 2 Ayenson, Mika D.; Wambach, Dietrich James; Soltani, Ashkan; Good, Nathan; Hoofnagle, Chris Jay (2011-07-29). "Flash Cookies and Privacy II: Now with HTML5 and ETag Respawning". Rochester, NY. doi:10.2139/ssrn.1898390. SSRN   1898390.{{cite journal}}: Cite journal requires |journal= (help)
  15. Andrés, José Angel González (2011-07-01). "Identity Denial in Internet". Inteligencia y Seguridad. 2011 (10): 75–101. doi:10.5211/iys.10.article6. ISSN   1887-293X.
  16. 1 2 3 4 "Samy Kamkar - Evercookie".
  17. "Evercookie source code". GitHub . 2010-10-13. Retrieved 2010-10-28.
  18. "Schneier on Security - Evercookies". 2010-09-23. Retrieved 2010-10-28.
  19. "Tackling Cross-Site Scripting (XSS) Attacks in Cyberspace", Securing Cyber-Physical Systems, CRC Press, pp. 350–367, 2015-10-06, doi:10.1201/b19311-18, ISBN   978-0-429-09104-9 , retrieved 2020-12-11
  20. "It is possible to kill the evercookie". 2010-10-27.
  21. Vega, Tanzina (2010-10-11). "New Web Code Draws Concern Over Privacy Risks (Published 2010)". The New York Times. ISSN   0362-4331 . Retrieved 2020-12-06.
  22. Lennon, Mike (2010-11-10). "Nevercookie Eats Evercookie With New Firefox Plugin" . Retrieved 2022-07-25.
  23. Nielsen, Janne (2019-10-02). "Experimenting with computational methods for large-scale studies of tracking technologies in web archives". Internet Histories. 3 (3–4): 293–315. doi:10.1080/24701475.2019.1671074. ISSN   2470-1475. S2CID   208121899.
  24. Samarasinghe, Nayanamana; Mannan, Mohammad (November 2019). "Towards a global perspective on web tracking". Computers & Security. 87: 101569. doi:10.1016/j.cose.2019.101569. ISSN   0167-4048. S2CID   199582679.
  25. "Researchers Call Out Websites for Tracking Users via Stealth Tactics". Berkeley Law. 10 August 2011. Retrieved 2020-12-06.
  26. "KISSmetrics, Hulu Sued Over New Tracking Technology". www.mediapost.com. Retrieved 2020-12-06.
  27. "KISSmetrics Settles Supercookies Lawsuit". www.mediapost.com. Retrieved 2020-12-06.
  28. Drury, Alexandra (2012). "How Internet Users' Identities Are Being Tracked and Used". Tulane Journal of Technology & Intellectual Property. 15. ISSN   2169-4567.
  29. "Tor stinks" (PDF). edwardsnowden.com.
  30. "TOR attacked – possibly by the NSA". Network Security. 2013 (8): 1–2. August 2013. doi:10.1016/s1353-4858(13)70086-2. ISSN   1353-4858.
  31. Vlajic, Natalija; Madani, Pooria; Nguyen, Ethan (2018-04-03). "Clickstream tracking of TOR users: may be easier than you think". Journal of Cyber Security Technology. 2 (2): 92–108. doi:10.1080/23742917.2018.1518060. ISSN   2374-2917. S2CID   169615236.
  32. Martin, Kelly D.; Murphy, Patrick E. (2016-09-22). "The role of data privacy in marketing". Journal of the Academy of Marketing Science. 45 (2): 135–155. doi:10.1007/s11747-016-0495-4. ISSN   0092-0703. S2CID   168554897.
  33. Chen, G.; Cox, J. H.; Uluagac, A. S.; Copeland, J. A. (Third Quarter 2016). "In-Depth Survey of Digital Advertising Technologies". IEEE Communications Surveys and Tutorials. 18 (3): 2124–2148. doi:10.1109/COMST.2016.2519912. ISSN   1553-877X. S2CID   32263374.
  34. Korolova, A. (December 2010). "Privacy Violations Using Microtargeted Ads: A Case Study". 2010 IEEE International Conference on Data Mining Workshops. pp. 474–482. doi:10.1109/ICDMW.2010.137. ISBN   978-1-4244-9244-2. S2CID   206785467.
  35. Mellet, Kevin; Beauvisage, Thomas (2019-09-02). "Cookie monsters. Anatomy of a digital market infrastructure". Consumption Markets & Culture. 23 (2): 110–129. doi:10.1080/10253866.2019.1661246. ISSN   1025-3866. S2CID   203058303.
  36. Raley, Rita (2013), "Dataveillance and Countervailance", "Raw Data" Is an Oxymoron, The MIT Press, pp. 121–146, doi:10.7551/mitpress/9302.003.0009, ISBN   978-0-262-31232-5, S2CID   199828237 , retrieved 2020-12-11