Web tracking

Last updated

Web tracking is the practice by which operators of websites and third parties collect, store and share information about visitors' activities on the World Wide Web. Analysis of a user's behaviour may be used to provide content that enables the operator to infer their preferences and may be of interest to various parties, such as advertisers. [1] [2] Web tracking can be part of visitor management. [3]

Contents

Uses

The uses of web tracking include the following:

Methods

IP address

Every device connected to the Internet is assigned a unique IP address, which is needed to enable devices to communicate with each other. With appropriate software on the host website, the IP address of visitors to the site can be logged and can also be used to determine the visitor's geographical location. [8] [9] Logging the IP address can, for example, monitor if a person voted more than once, as well as their viewing pattern. Knowing the visitor's location indicates, besides other things, the country. This may, for example, result in prices being quoted in the local currency, the price or the range of goods that are available, special conditions applying and in some cases requests from or responses to a certain country being blocked entirely. Internet users may circumvent censorship and geo-blocking and protect personal identity and location to stay anonymous on the internet using a VPN connection.

A HTTP cookie is code and information embedded onto a user's device by a website when the user visits the website. [10] The website might then retrieve the information on the cookie on subsequent visits to the website by the user. Cookies can be used to customise the user's browsing experience and to deliver targeted ads. [11] Some browsing activities that cookies can store are:

First- and third-party cookies

A first-party cookie is created by the website the user is visiting. These cookies are considered "good" since they help the user rather than spy on them. The main goal of first-party cookies is to recognize the user and their preferences so that their desired settings can be applied. [12]

A third-party cookie is created by websites other than the one a user visits. They insert additional tracking code that can record a user's online activity. On-site analytics refers to data collection on the current site. It is used to measure many aspects of user interactions, including the number of times a user visits. [13]

Restrictions on third-party cookies introduced by web browsers are bypassed by some tracking companies using a technique called CNAME cloaking  [ de ], where a third-party tracking service is assigned a DNS record in the first-party origin domain (usually CNAME) so that it's masqueraded as first-party even though it's a separate entity in legal and organizational terms. This technique is blocked by some browsers and ad blockers using block lists of known trackers. [14] [15]

ETags

ETags can be used to track unique users, [16] as HTTP cookies are increasingly being deleted by privacy-aware users. In July 2011, Ashkan Soltani and a team of researchers at UC Berkeley reported that a number of websites, including Hulu, were using ETags for tracking purposes. [17] Hulu and KISSmetrics have both ceased "respawning" as of 29 July 2011, [18] as KISSmetrics and over 20 of its clients are facing a class-action lawsuit over the use of "undeletable" tracking cookies partially involving the use of ETags. [19]

Because ETags are cached by the browser and returned with subsequent requests for the same resource, a tracking server can simply repeat any ETag received from the browser to ensure an assigned ETag persists indefinitely (in a similar way to persistent cookies). Additional caching headers can also enhance the preservation of ETag data. [20]

ETags may be flushable by clearing the browser cache (implementations vary).

Other methods

Unethical Nature of Web Tracking

Web browsing is linked to a user's personal information. Location, interests, purchases, and more can be revealed just by what page a user visits. This allows them to draw conclusions about a user, and analyze patterns of activity. [33] Use of web tracking is unethical when applied in the context of a private individual; and to varying degrees is subject to legislation such as the EU's eCommerce Directive and the UK's Data Protection Act. When it is done without the knowledge of a user, it is considered a breach of browser security.

Justification

In a business-to-business context, understanding a visitor's behavior in order to identify buying intentions is seen by many commercial organizations as an effective way to target marketing activities. [34] Visiting companies can be approached, both online and offline, with marketing and sales propositions which are relevant to their current requirements. From the point of view of a sales organization, engaging with a potential customer when they are actively looking to buy can produce savings in otherwise wasted marketing funds.

Prevention

The most advanced protection tools are or include Firefox's tracking protection and the browser add-ons uBlock Origin and Privacy Badger. [32] [35] [36]

Moreover, they may include the browser add-on NoScript, the use of an alternative search engine like DuckDuckGo and the use of a VPN. However, VPNs cost money and as of 2023 NoScript may "make general web browsing a pain". [36]

On mobile

On mobile, the most advanced method may be the use of the mobile browser Firefox Focus, which mitigates web tracking on mobile to a large extent, including Total Cookie Protection and similar to the private mode in the conventional Firefox browser. [37] [38] [39]

Opt-out requests

Users can also control third-party web tracking to some extent by other means. Opt-out cookies let users block websites from installing future cookies. Websites may be blocked from installing third-party advertisers or cookies on a browser, which will prevent tracking on the user's page. [40] Do Not Track is a web browser setting that can request a web application to disable the tracking of a user. Enabling this feature will send a request to the website users are on to voluntarily disable their cross-site user tracking.

Privacy mode

Contrary to popular belief, browser privacy mode does not prevent (all) tracking attempts because it usually only blocks the storage of information on the visitor site (cookies). It does not help, however, against the various fingerprinting methods. Such fingerprints can be de-anonymized. [41] Many times, the functionality of the website fails. For example, one may not be able to log in to the site, or preferences are lost.[ citation needed ]

Browsers

Some web browsers use "tracking protection" or "tracking prevention" features to block web trackers. [42] The teams behind the NoScript and uBlock add-ons have assisted with developing Firefox's SmartBlock capabilities. [43]

Search Engines

To safeguard user data from tracking by search engines, various privacy focused search engines have been developed as viable alternatives. Examples of such search engines include DuckDuckGo, MetaGer, and Swiscows, which prioritize preventing the storage and tracking of user activity. It's worth noting that while these alternatives offer enhanced privacy, some may not guarantee complete anonymity, and a few might be less user-friendly compared to mainstream search engines such as Google and Microsoft Bing. [44]

See also

Related Research Articles

Internet privacy involves the right or mandate of personal privacy concerning the storage, re-purposing, provision to third parties, and display of information pertaining to oneself via the Internet. Internet privacy is a subset of data privacy. Privacy concerns have been articulated from the beginnings of large-scale computer sharing and especially relate to mass surveillance.

<span class="mw-page-title-main">Usage share of web browsers</span> Relative market adoption of web browsers

The usage share of web browsers is the portion, often expressed as a percentage, of visitors to a group of web sites that use a particular web browser.

<span class="mw-page-title-main">Google Analytics</span> Web analytics service from Google

Google Analytics is a web analytics service offered by Google that tracks and reports website traffic and also mobile app traffic and events, currently as a platform inside the Google Marketing Platform brand. Google launched the service in November 2005 after acquiring Urchin.

<span class="mw-page-title-main">HTTP cookie</span> Small pieces of data stored by a web browser while on a website

HTTP cookies are small blocks of data created by a web server while a user is browsing a website and placed on the user's computer or other device by the user's web browser. Cookies are placed on the device used to access a website, and more than one cookie may be placed on a user's device during a session.

A local shared object (LSO), commonly called a Flash cookie, is a piece of data that websites that use Adobe Flash may store on a user's computer. Local shared objects have been used by all versions of Flash Player since version 6.

A device fingerprint or machine fingerprint is information collected about the software and hardware of a remote computing device for the purpose of identification. The information is usually assimilated into a brief identifier using a fingerprinting algorithm. A browser fingerprint is information collected specifically by interaction with the web browser of the device.

<span class="mw-page-title-main">Private browsing</span> Privacy feature in some web browsers

Private browsing is a feature in some web browsers that enhances user privacy. In this mode, the browser initiates a temporary session separate from its main session and user data. The browsing history is not recorded, and local data related to the session, like Cookies and Web cache, are deleted once the session ends. The primary purpose of these modes is to ensure that data and history from a specific browsing session do not remain on the device or get accessed by another user of the same device. In web development, it can be used to quickly test displaying pages as they appear to first-time visitors.

<span class="mw-page-title-main">Epic (web browser)</span> Indian Web Browser based on chromium

Epic is an Indian proprietary privacy-centric web browser developed by Hidden Reflex using Chromium source code. Epic is always in private browsing mode, and exiting the browser deletes all browser data. The browser's developers claim that Google's tracking code has been removed, and that blocks other companies from tracking the user.

<span class="mw-page-title-main">Evercookie</span> JavaScript application programming interface

Evercookie is an open-source JavaScript application programming interface (API) that identifies and reproduces intentionally deleted cookies on the clients' browser storage. This behavior is known as a Zombie cookie. It was created by Samy Kamkar in 2010 to demonstrate the possible infiltration from the websites that use respawning. Websites that have adopted this mechanism can identify users even if they attempt to delete the previously stored cookies.

<span class="mw-page-title-main">Web browsing history</span> List of web pages a user has visited recently

Web browsing history refers to the list of web pages a user has visited, as well as associated metadata such as page title and time of visit. It is usually stored locally by web browsers in order to provide the user with a history list to go back to previously visited pages. It can reflect the user's interests, needs, and browsing habits.

A zombie cookie is a piece of data usually used for tracking users, which is created by a web server while a user is browsing a website, and placed on the user's computer or other device by the user's web browser, similar to regular HTTP cookies, but with mechanisms in place to prevent the deletion of the data by the user. Zombie cookies could be stored in multiple locations—since failure to remove all copies of the zombie cookie will make the removal reversible, zombie cookies can be difficult to remove. Since they do not entirely rely on normal cookie protocols, the visitor's web browser may continue to recreate deleted cookies even though the user has opted not to receive cookies.

Do Not Track (DNT) is a formerly official HTTP header field, designed to allow internet users to opt out of tracking by websites—which includes the collection of data regarding a user's activity across multiple distinct contexts, and the retention, use, or sharing of data derived from that activity outside the context in which it occurred.

Ghostery is a free and open-source privacy and security-related browser extension and mobile browser application. Since February 2017, it has been owned by the German company Cliqz International GmbH. The code was originally developed by David Cancel and associates.

Browser security is the application of Internet security to web browsers in order to protect networked data and computer systems from breaches of privacy or malware. Security exploits of browsers often use JavaScript, sometimes with cross-site scripting (XSS) with a secondary payload using Adobe Flash. Security exploits can also take advantage of vulnerabilities that are commonly exploited in all browsers.

<span class="mw-page-title-main">Jonathan Mayer</span> American computer scientist and lawyer

Jonathan Mayer is an American computer scientist and lawyer. He is an Associate Professor of Computer Science and Public Affairs at Princeton University affiliated with the Center for Information Technology Policy, and was previously a PhD student in computer science at Stanford University and a fellow at the Center for Internet and Society and the Center for International Security and Cooperation. During his graduate studies he was a consultant at the California Department of Justice.

Canvas fingerprinting is one of a number of browser fingerprinting techniques for tracking online users that allow websites to identify and track visitors using the HTML5 canvas element instead of browser cookies or other similar means. The technique received wide media coverage in 2014 after researchers from Princeton University and KU Leuven University described it in their paper The Web never forgets.

<span class="mw-page-title-main">Firefox Focus</span> Free and open-source privacy-focused web browser by Mozilla

Firefox Focus is a free and open-source privacy-focused mobile browser by Mozilla, based on Firefox. It is available for Android and iOS smartphones and tablets. Its predecessor, Focus by Firefox, was released in December 2015 as a tracker-blocking application which worked only in conjunction with the Safari mobile browser on iOS. It was developed into a minimalist web browser in 2016 but retained this background blocking functionality. The Android version of the browser was first released in June 2017 and was downloaded over one million times in the first month. As of January 2017, it was available in 27 languages. The version released for German-speaking countries has telemetry disabled and is named Firefox Klar to avoid ambiguity with the German news magazine FOCUS.

Third-party cookies are HTTP cookies which are used principally for web tracking as part of the web advertising ecosystem.

Click tracking is when user click behavior or user navigational behavior is collected in order to derive insights and fingerprint users. Click behavior is commonly tracked using server logs which encompass click paths and clicked URLs. This log is often presented in a standard format including information like the hostname, date, and username. However, as technology develops, new software allows for in depth analysis of user click behavior using hypervideo tools. Given that the internet can be considered a risky environment, research strives to understand why users click certain links and not others. Research has also been conducted to explore the user experience of privacy with making user personal identification information individually anonymized and improving how data collection consent forms are written and structured.

Federated Learning of Cohorts (FLoC) is a type of web tracking. It groups people into "cohorts" based on their browsing history for the purpose of interest-based advertising. FLoC was being developed as a part of Google's Privacy Sandbox initiative, which includes several other advertising-related technologies with bird-themed names. Despite "federated learning" in the name, FLoC does not utilize any federated learning.

References

  1. D. Sundarasen, Sheela Devi (2019-04-08). "Institutional characteristics, signaling variables and IPO initial returns". PSU Research Review. 3 (1): 29–49. doi: 10.1108/prr-10-2016-0003 . ISSN   2399-1747.
  2. Samarasinghe, Nayanamana; Mannan, Mohammad (2019-11-01). "Towards a global perspective on web tracking". Computers & Security. 87: 101569. doi:10.1016/j.cose.2019.101569. S2CID   199582679.
  3. Nielsen, Janne (2021-04-27). "Using mixed methods to study the historical use of web beacons in web tracking". International Journal of Digital Humanities. 2 (1–3): 65–88. doi:10.1007/s42803-021-00033-4. ISSN   2524-7832. S2CID   233416836.
  4. "Internet Safety: Understanding Browser Tracking". GCFGlobal.org. Retrieved 2019-12-13.
  5. Valentino-DeVries, Jennifer (2019-04-13). "Tracking Phones, Google Is a Dragnet for the Police (Published 2019)". The New York Times. ISSN   0362-4331. Archived from the original on 2022-10-30. Retrieved 2020-10-23.
  6. Kleinberg, Samantha; Mishra, Bud (2008). "PSST". Proceedings of the 17th international conference on World Wide Web. New York, New York, USA: ACM Press. pp. 1143–1144. doi:10.1145/1367497.1367697. ISBN   9781605580852. S2CID   15179069.
  7. "What is Usability Testing?". The Interaction Design Foundation. Retrieved 2019-12-13.
  8. "What is an IP address?". HowStuffWorks. 2001-01-12. Retrieved 2019-12-13.
  9. "How cookies track you around the web & how to stop them". Privacy.net. 2018-02-24. Retrieved 2019-12-13.
  10. Kobusińska, Anna; Pawluczuk, Kamil; Brzeziński, Jerzy (2018). "Big Data fingerprinting information analytics for sustainability". Future Generation Computer Systems. 86: 1321–1337. doi:10.1016/j.future.2017.12.061. S2CID   49646910.
  11. Martin, Kirsten (2015-12-22). "Data aggregators, consumer data, and responsibility online: Who is tracking consumers online and should they stop?". The Information Society. 32 (1): 51–63. doi:10.1080/01972243.2015.1107166. ISSN   0197-2243. S2CID   205509140.
  12. "What are first-party cookies?". IONOS Digitalguide. Retrieved 2022-01-13.
  13. Loshin, David; Reifer, Abie (2013-01-01), Loshin, David; Reifer, Abie (eds.), "Chapter 4. Customer Lifetime and Value Analytics", Using Information to Develop a Culture of Customer Centricity, Morgan Kaufmann, pp. 23–31, ISBN   9780124105430 , retrieved 2019-11-11.
  14. "Online Trackers Are Now Shifting To New Invasive CNAME Cloaking Technique". The Hack Report. 2021-02-27. Retrieved 2021-04-14.
  15. Dimova, Yana; Acar, Gunes; Olejnik, Lukasz; Joosen, Wouter; Van Goethem, Tom (2021-02-23). "The CNAME of the Game: Large-scale Analysis of DNS-based Tracking Evasion". arXiv: 2102.09301 [cs.CR].
  16. "tracking without cookies". 17 February 2003.
  17. Ayenson, Mika D.; Wambach, Dietrich James; Soltani, Ashkan; Good, Nathan; Hoofnagle, Chris Jay (29 July 2011). "Flash Cookies and Privacy II: Now with HTML5 and ETag Respawning". SSRN   1898390.
  18. Soltani, Ashkan (11 August 2011). "Flash Cookies and Privacy II". askhansoltani.org. Retrieved 2023-06-27.
  19. Anthony, Sebastian (2011-08-04). "AOL, Spotify, GigaOm, Etsy, KISSmetrics sued over undeletable tracking cookies". ExtremeTech . Retrieved 2023-06-27.
  20. "Cookieless cookies". GitHub lucb1e. 2013-08-25. Retrieved 2023-06-27.
  21. Andrea Fortuna (2017-11-06). "What is Canvas Fingerprinting and how the companies use it to track you online | So Long, and Thanks for All the Fish" . Retrieved 2019-12-13.
  22. BigCommerce (2019-12-12). "What is cross-device tracking?". BigCommerce. Retrieved 2019-12-13.
  23. "What is online tracking and how do websites track you?". Koofr blog. Retrieved 2019-12-13.
  24. "Cookies - Definition - Trend Micro USA". www.trendmicro.com. Retrieved 2019-12-13.
  25. "Session replay", Wikipedia, 2019-10-15, retrieved 2019-12-13
  26. "FullStory | Build a More Perfect Digital Experience | FullStory". www.fullstory.com. Retrieved 2021-04-05.
  27. "Redirect tracking protection - Privacy, permissions, and information security | MDN". developer.mozilla.org. Retrieved 2022-06-29.
  28. Goodin, Dan (2021-02-19). "New browser-tracking hack works even when you flush caches or go incognito". Ars Technica. Retrieved 2021-02-21.
  29. "Federated Learning Component". source.chromium.org. Retrieved 2023-02-27.
  30. Cyphers, Bennett (2021-03-03). "Google's FLoC Is a Terrible Idea". Electronic Frontier Foundation. Retrieved 2021-03-05.
  31. Patringenaru, Ioana. "New web tracking technique is bypassing privacy protections". University of California-San Diego via techxplore.com. Retrieved 18 January 2023.
  32. 1 2 Randall, Audrey; Snyder, Peter; Ukani, Alisha; Snoeren, Alex C.; Voelker, Geoffrey M.; Savage, Stefan; Schulman, Aaron (25 October 2022). "Measuring UID smuggling in the wild". Proceedings of the 22nd ACM Internet Measurement Conference. Association for Computing Machinery. pp. 230–243. doi:10.1145/3517745.3561415. ISBN   9781450392594. S2CID   250494286.
  33. Mayer, J. R.; Mitchell, J. C. (May 2012). "Third-Party Web Tracking: Policy and Technology". 2012 IEEE Symposium on Security and Privacy. pp. 413–427. CiteSeerX   10.1.1.388.5781 . doi:10.1109/SP.2012.47. ISBN   978-1-4673-1244-8. S2CID   14652884.
  34. "Website visitor tracking going too far?". Prospectvision.net. Archived from the original on 2012-07-19. Retrieved 2012-08-03.
  35. Wallen, Jack (24 October 2018). "How to use Ublock Origin and Privacy Badger to prevent browser tracking in Firefox". TechRepublic. Retrieved 3 February 2023.
  36. 1 2 "Our Favorite Ad Blockers and Browser Extensions to Protect Privacy". The New York Times. 10 January 2023. Retrieved 3 February 2023.
  37. "Mozilla unveils Total Cookie Protection for Firefox Focus on Android". ZDNET. Retrieved 3 February 2023.
  38. Chen, Brian X. (31 March 2021). "If You Care About Privacy, It's Time to Try a New Web Browser". The New York Times. Retrieved 3 February 2023.
  39. "Firefox enables its anti-tracking feature by default". Engadget. Retrieved 3 February 2023.
  40. "What is an Opt Out Cookie? - All about Cookies". www.allaboutcookies.org. 27 September 2018. Retrieved 2019-11-11.
  41. "Think you're anonymous online? A third of popular websites are 'fingerprinting' you". Washington Post.
  42. "Firefox 42.0 release notes".
  43. Katz, Sarah. "Firefox 87 reveals SmartBlock for private browsing". techxplore.com. Retrieved 3 February 2023.
  44. Abdulaziz Saad Bubukayr, Maryam; Frikha, Mounir (2022). "Web Tracking Domain and Possible Privacy Defending Tools: A Literature Review". Journal of Cybersecurity. 4 (2): 79–94. doi: 10.32604/jcs.2022.029020 . ISSN   2579-0064.
  45. "What is the Definition of Online Privacy? | Winston & Strawn Legal Glossary". Winston & Strawn. Retrieved 2019-12-13.
  46. "Web Analytics Basics". www.usability.gov. 2013-10-08. Retrieved 2019-12-13.
  47. Beal, Vangie (22 January 2002). "What is Web Beacon? Webopedia Definition". www.webopedia.com. Retrieved 2019-12-13.