Web beacon

Last updated

A web beacon [note 1] is a technique used on web pages and email to unobtrusively (usually invisibly) allow checking that a user has accessed some content. [1] Web beacons are typically used by third parties to monitor the activity of users at a website for the purpose of web analytics or page tagging. [2] They can also be used for email tracking. [3] When implemented using JavaScript, they may be called JavaScript tags. [4] Web beacons are unseen HTML elements that track a webpage views. Upon the user revisiting the webpage, these beacons are connected to cookies established by the server, facilitating undisclosed user tracking. [5]

Contents

Using such beacons, companies and organizations can track the online behavior of web users. At first, the companies doing such tracking were mainly advertisers or web analytics companies; later social media sites also started to use such tracking techniques, for instance through the use of buttons that act as tracking beacons.

In 2017, W3C published a candidate specification for an interface that web developers can use to create web beacons. [6]

An innocuous web beacon referencing a file on Wikimedia Commons No image.svg
An innocuous web beacon referencing a file on Wikimedia Commons

Overview

A web beacon is any of several techniques used to track who is visiting a web page. They can also be used to see if an email was read or forwarded or if a web page was copied to another website. [7]

The first web beacons were small digital image files that were embedded in a web page or email. The image could be as small as a single pixel (a "tracking pixel") and could have the same colour as the background, or be completely transparent. [8] When a user opens the page or email where such an image is embedded, they might not see the image, but their web browser or email reader automatically downloads the image, requiring the user's computer to send a request to the host company's server, where the source image is stored. This request provides identifying information about the computer, allowing the host to keep track of the user.

This basic technique has been developed further so that many types of elements can be used as beacons. Currently, these can include visible elements such as graphics, banners, or buttons, but also non-pictorial HTML elements such as the frame, style, script, input link, embed, object, etc., of an email or web page.

The identifying information provided by the user's computer typically includes its IP address, the time the request was made, the type of web browser or email reader that made the request, and the existence of cookies previously sent by the host server. The host server can store all of this information, and associate it with a session identifier or tracking token that uniquely marks the interaction.

Use by companies

Once a company can identify a particular user, the company can then track that user's behavior across multiple interactions with different websites or web servers. As an example, consider a company that owns a network of websites. This company could store all of its images on one particular server, but store the other contents of its web pages on a variety of other servers. For instance, each server could be specific to a given website, and could even be located in a different city. But the company could use web beacons requesting data from its one image server to count and recognize individual users who visit different websites. Rather than gathering statistics and managing cookies for each server independently, the company can analyze all this data together, and track the behavior of individual users across all the different websites, assembling a profile of each user as they navigate through these different environments.

Email tracking

Web beacons embedded in emails have greater privacy implications than beacons embedded in web pages. Through the use of an embedded beacon, the sender of an email – or even a third party – can record the same sort of information as an advertiser on a website, namely the time that the email was read, the IP address of the computer that was used to read the email (or the IP address of the proxy server that the reader went through), the type of software used to read the email, and the existence of any cookies previously sent. In this way, the sender – or a third party – can gather detailed information about when and where each particular recipient reads their email. Every subsequent time the email message is displayed, the same information can be sent again to the sender or third party.

"Return-receipt-to" (RRT) email headers can also trigger sending of information and these may be seen as another form of a web beacon. [9]

Web beacons are used by email marketers, spammers, and phishers to verify that an email is read. Using this system, they can send similar emails to a large number of addresses and then check which ones are valid. Valid in this case means that the address is actually in use, that the email has made it past spam filters, and that the content of the email is actually viewed.

To some extent, this kind of email tracking can be prevented by configuring the email reader software to avoid accessing remote images.

One way to neutralize such email tracking is to disconnect from the Internet after downloading email but before reading the downloaded messages. (Note that this assumes one is using an email reader that resides on one's own computer and downloads the emails from the email server to one's own computer.) In that case, messages containing beacons will not be able to trigger requests to the beacons' host servers, and the tracking will be prevented. But one would then have to delete any messages suspected of containing beacons or risk having the beacons activate again once the computer is reconnected to the Internet.

Web beacons can also be filtered out at the server level so that they never reach the end-user.

Beacon API

The Beacon API (application programming interface) is a candidate recommendation of the World Wide Web Consortium, the standards organization for the web. [10] It is a standardized API that directs the email client to silently send tracking data back to the server, i.e., without alerting the user and thus disturbing their experience. [11]

Use of this Beacon API enables user tracking and profiling without the end-user's awareness, as it is invisible to them, and without delaying or otherwise interfering with navigation within or away from the site. [12] Support for the Beacon API was introduced into Mozilla's Firefox browser in February 2014 [13] and in Google's Chrome browser in November 2014. [14]

Notes

  1. Also called web bug, tracking bug, tag, web tag, page tag, tracking pixel, pixel tag, 1×1 GIF, or clear GIF.

Related Research Articles

<span class="mw-page-title-main">World Wide Web</span> Linked hypertext system on the Internet

The World Wide Web is an information system that enables content sharing over the Internet through user-friendly ways meant to appeal to users beyond IT specialists and hobbyists. It allows documents and other web resources to be accessed over the Internet according to specific rules of the Hypertext Transfer Protocol (HTTP).

<span class="mw-page-title-main">Website</span> Set of related web pages served from a single domain

A website is a collection of web pages and related content that is identified by a common domain name and published on at least one web server. Websites are typically dedicated to a particular topic or purpose, such as news, education, commerce, entertainment, or social media. Hyperlinking between web pages guides the navigation of the site, which often starts with a home page. The most-visited sites are Google, YouTube, and Facebook.

Inline linking is the use of a linked object, often an image, on one site by a web page belonging to a second site. One site is said to have an inline link to the other site where the object is located.

A query string is a part of a uniform resource locator (URL) that assigns values to specified parameters. A query string commonly includes fields added to a base URL by a Web browser or other client application, for example as part of an HTML document, choosing the appearance of a page, or jumping to positions in multimedia content.

Internet privacy involves the right or mandate of personal privacy concerning the storage, re-purposing, provision to third parties, and display of information pertaining to oneself via the Internet. Internet privacy is a subset of data privacy. Privacy concerns have been articulated from the beginnings of large-scale computer sharing and especially relate to mass surveillance.

The Platform for Privacy Preferences Project (P3P) is an obsolete protocol allowing websites to declare their intended use of information they collect about web browser users. Designed to give users more control of their personal information when browsing, P3P was developed by the World Wide Web Consortium (W3C) and officially recommended on April 16, 2002. Development ceased shortly thereafter and there have been very few implementations of P3P. Internet Explorer and Microsoft Edge were the only major browsers to support P3P. Microsoft has ended support from Windows 10 onwards. Internet Explorer and Edge on Windows 10 no longer support P3P. The president of TRUSTe has stated that P3P has not been implemented widely due to the difficulty and lack of value.

Push technology, also known as server push, refers to a communication method, where the communication is initiated by a server rather than a client. This approach is different from the "pull" method where the communication is initiated by a client.

Web analytics is the measurement, collection, analysis, and reporting of web data to understand and optimize web usage. Web analytics is not just a process for measuring web traffic but can be used as a tool for business and market research and assess and improve website effectiveness. Web analytics applications can also help companies measure the results of traditional print or broadcast advertising campaigns. It can be used to estimate how traffic to a website changes after launching a new advertising campaign. Web analytics provides information about the number of visitors to a website and the number of page views, or create user behavior profiles. It helps gauge traffic and popularity trends, which is useful for market research.

The canvas element is part of HTML5 and allows for dynamic, scriptable rendering of 2D shapes and bitmap images. It is a low level, procedural model that updates a bitmap. HTML5 Canvas also helps in making 2D games.

Emailtracking is a method for monitoring whether the email message is read by the intended recipient. Most tracking technologies use some form of digitally time-stamped record to reveal the exact time and date when an email is received or opened, as well as the IP address of the recipient.

<span class="mw-page-title-main">Google Analytics</span> Web analytics service from Google

Google Analytics is a web analytics service offered by Google that tracks and reports website traffic and also the mobile app traffic & events, currently as a platform inside the Google Marketing Platform brand. Google launched the service in November 2005 after acquiring Urchin.

<span class="mw-page-title-main">HTTP cookie</span> Small pieces of data stored by a web browser while on a website

HTTP cookies are small blocks of data created by a web server while a user is browsing a website and placed on the user's computer or other device by the user's web browser. Cookies are placed on the device used to access a website, and more than one cookie may be placed on a user's device during a session.

A local shared object (LSO), commonly called a Flash cookie, is a piece of data that websites that use Adobe Flash may store on a user's computer. Local shared objects have been used by all versions of Flash Player since version 6.

<span class="mw-page-title-main">Epic (web browser)</span> Indian Web Browser based on chromium

Epic is a proprietary privacy-centric web browser. It was developed by Hidden Reflex, a software product company founded by Alok Bhardwaj, using Chromium source code. Epic is always in private browsing mode, and exiting the browser deletes all browser data. The browser's developers claim that Google's tracking code has been removed, and that blocks other companies from tracking the user.

<span class="mw-page-title-main">Evercookie</span> JavaScript application programming interface

Evercookie is a JavaScript application programming interface (API) that identifies and reproduces intentionally deleted cookies on the clients' browser storage. It was created by Samy Kamkar in 2010 to demonstrate the possible infiltration from the websites that use respawning. Websites that have adopted this mechanism can identify users even if they attempt to delete the previously stored cookies.

The Web platform is a collection of technologies developed as open standards by the World Wide Web Consortium and other standardization bodies such as the Web Hypertext Application Technology Working Group, the Unicode Consortium, the Internet Engineering Task Force, and Ecma International. It is the umbrella term introduced by the World Wide Web Consortium, and in 2011 it was defined as "a platform for innovation, consolidation and cost efficiencies" by W3C CEO Jeff Jaffe. Being built on The evergreen Web has allowed for the addition of new capabilities while addressing security and privacy risks. Additionally, developers are enabled to build interoperable content on a cohesive platform.

Cross-site request forgery, also known as one-click attack or session riding and abbreviated as CSRF or XSRF, is a type of malicious exploit of a website or web application where unauthorized commands are submitted from a user that the web application trusts. There are many ways in which a malicious website can transmit such commands; specially-crafted image tags, hidden forms, and JavaScript fetch or XMLHttpRequests, for example, can all work without the user's interaction or even knowledge. Unlike cross-site scripting (XSS), which exploits the trust a user has for a particular site, CSRF exploits the trust that a site has in a user's browser. In a CSRF attack, an innocent end user is tricked by an attacker into submitting a web request that they did not intend. This may cause actions to be performed on the website that can include inadvertent client or server data leakage, change of session state, or manipulation of an end user's account.

Ghostery is a free and open-source privacy and security-related browser extension and mobile browser application. Since February 2017, it has been owned by the German company Cliqz International GmbH. The code was originally developed by David Cancel and associates.

Upload components are software products that are designed to be embedded into a web site to add upload functionality to it. Upload components are designed to replace the standard HTML4 upload mechanism. Compared with HTML4, Upload Components have a more user-friendly interface and support a wider range of features.

Spy pixels or tracker pixels are hyperlinks to remote image files in HTML email messages that have the effect of spying on the person reading the email if the image is downloaded. They are commonly embedded in the HTML of an email as small, imperceptible, transparent graphic files. Spy pixels are commonly used in marketing, and there are several countermeasures in place that aim to block email tracking pixels. However, there are few regulations in place that effectively guard against email tracking approaches.

References

  1. Stefanie Olsen (January 2, 2002). "Nearly undetectable tracking device raises concern". CNET News . Archived from the original on November 7, 2014. Retrieved May 23, 2019.
  2. Richard M. Smith (November 11, 1999). "The Web Bug FAQ". EFF.org Privacy Archive. Archived from the original on June 29, 2012. Retrieved July 12, 2012.
  3. Richard Lowe Jr And Claudia Arevalo-Lowe. "Email web bug invisible tracker collects info without permission". mailsbroadcast.com. Archived from the original on December 3, 2017. Retrieved August 22, 2016.
  4. "Negrino, Tom; Smith, Dori. JavaScript para World Wide Web. Pearson Education, 2001. accessed 1 October 2015". Archived from the original on May 12, 2016. Retrieved October 1, 2015.
  5. Payton, Anne M. (September 22, 2006). "A review of spyware campaigns and strategies to combat them". Proceedings of the 3rd annual conference on Information security curriculum development. InfoSecCD '06. New York, NY, USA: Association for Computing Machinery. pp. 136–141. doi:10.1145/1231047.1231077. ISBN   978-1-59593-437-6.
  6. Jatinder Mann; Alois Reitbauer (April 13, 2017). "Beacon". W3C Candidate Recommendation. W3C. Archived from the original on October 27, 2019. Retrieved November 7, 2019.{{cite web}}: CS1 maint: multiple names: authors list (link)
  7. Bouguettaya, A. R. A.; Eltoweissy, M. Y. (2003). "Privacy on the Web: facts, challenges, and solutions". IEEE Security Privacy. 1 (6): 40–49. doi:10.1109/MSECP.2003.1253567. ISSN   1558-4046. Archived from the original on August 25, 2021. Retrieved March 29, 2021.
  8. Nielsen, Janne (April 27, 2021). "Using mixed methods to study the historical use of web beacons in web tracking". International Journal of Digital Humanities. 2 (1–3): 65–88. doi:10.1007/s42803-021-00033-4. ISSN   2524-7832. S2CID   233416836.
  9. See Internet Engineering Task Force memorandum RFC 4021.
  10. "Beacon W3C Candidate Recommendation 13 April 2017". Archived from the original on March 3, 2021. Retrieved July 26, 2017.
  11. Introduction to the Beacon API Archived March 9, 2021, at the Wayback Machine - Sitepoint.com, January 2015
  12. Squeezing the Most Into the New W3C Beacon API Archived October 3, 2017, at the Wayback Machine - NikCodes, 16 December 2014
  13. Navigator.sendBeacon Archived April 30, 2021, at the Wayback Machine - Mozilla Developer Network
  14. Send beacon data in Chrome 39 Archived April 13, 2021, at the Wayback Machine - developers.google.com, September 2015