This article may be too technical for most readers to understand.(June 2024) |
Click tracking is when user click behavior or user navigational behavior is collected in order to derive insights and fingerprint users. [1] [2] Click behavior is commonly tracked using server logs which encompass click paths and clicked URLs (Uniform Resource Locator). [2] [3] This log is often presented in a standard format including information like the hostname, date, and username. [2] However, as technology develops, new software allows for in depth analysis of user click behavior using hypervideo tools. [1] Given that the internet can be considered a risky environment, research strives to understand why users click certain links and not others. [4] Research has also been conducted to explore the user experience of privacy with making user personal identification information individually anonymized and improving how data collection consent forms are written and structured. [5] [6]
Click tracking is relevant in several industries including Human-Computer Interaction (HCI), software engineering, and advertising. [1] [7] Email tracking, link tracking, web analytics, and user research are also related concepts and applications of click tracking. [8] A common utilization of click data from click tracking is to improve results' positions from search engines to make their order more relevant to users' needs. [9] Click tracking employs many modern techniques such as machine learning and data mining. [9]
Tracking and recording technologies (TRTs) can be split into two categories, institutional TRTs and end-user TRTs. [10] Institutional TRTs and end-user TRTs differ by who is collecting and storing the data, and this can be respectively understood as institutions and users. Examples of TRTs include radio frequency identification (RFID), credit cards, and store video cameras. Research suggests that individuals are concerned with privacy, but they are less concerned with how TRTs are used daily. [10] This discrepancy has been attributed to the public not understanding how information about them is getting collected. [10]
Another means of obtaining user input is eye-tracking or gaze tracking. Gaze-tracking technology is especially beneficial for those with motor disabilities. [11] Systems that employ gaze-tracking often try to mimic cursor and keyboard behavior. [11] In this process, the gaze-tracking system is separated into its own panel in the system interface, and the user experience of this system is compromised as individuals have to switch between the panel and the other interface features. The experience is also difficult because users have to first imagine how to complete the task using keyboard and cursor features and then employ gaze. This causes tasks to take additional time. [11] Hence, researchers created their own web browser called GazeTheWeb (GTW), and the focus of their research was on the user experience. They improved the interface to incorporate gaze better. [11]
Eye-movement tracking is also applied in usability testing when creating web applications. [12] However, in order to track user eye movements, a lab setting with appropriate equipment is often required. Mouse and keyboard activity can be measured remotely, so this quality can be capitalized for usability testing. [12] Algorithms can use mouse movements to predict and trace user eye movements. Such tracking in a remote environment is denoted as a remote logging technique. [12]
Browser fingerprinting is another means of identifying users and tracking them. [13] In this process, information about a user is collected from their web browser to create a browser fingerprint. A browser fingerprint contains information about a device, its operating system, its browser, and its configuration. HTTP headers, JavaScript, and browser plugins can be used to build a fingerprint. [13] Browser fingerprints can change over time from automatic software updates or user browser preference adjustments. Measures to increase privacy in this realm can reduce functionality by blocking features. [13]
User browsing behavior is often tracked using server access logs which contain patterns of clicked URLs, queries, and paths. [1] However, more modern tracking software utilizes JavaScript in order to track cursor behavior. The collected mouse data can be used to create videos, allowing for user behavior to be replayed and easily analyzed. Hypermedia is used to create such visualizations that allow for behavior like highlighting, hesitating, and selecting to be monitored. [1] Technology that is used to record such behavior can also be used to predict it. One of these monitoring tools, SMT2є, collects fifteen cursor features and uses the other fourteen to predict the last feature's outcome. [1] This software also generates a log analysis which summarizes user cursor activity. [1]
In a search session, users can be identified using cookies, identd protocol, or their IP address. This information can then be stored in a database, and every time a user visits a web page again, their click behavior will be appended to the database. DoubleClick Inc. is an example of a company that has such a database and partners with other companies to aid with their web mining. [2] Cookies are added to HTTP (Hypertext Transfer Protocol), and when a user clicks on a link, they are connected to the associated web server. [3] This action of a user clicking on a link is seen as a request, and the server “responds” by sending the user's information, and this information is a cookie. [3] Cookies provide a “bookmark” for users’ sessions on a website, and they store user login information and the pages users visit on a website. [3] This aids with preserving the state of the session. If there is more than one such server, information must be consistent among all servers; hence, information is transferred. Data collected via cookies can be used to improve websites for all users and this also aids with user profiling for advertising. [3]
When data mining techniques and statistical procedures are applied to understand web log data, the process is noted as log analysis or web usage mining. This helps with determining patterns in the users’ navigational behaviors. [2] Some features that can be observed include how long users viewed pages for, click path lengths, and the number of clicks. [2] Web usage mining has three phases. First, the log data is "preprocessed" to see the users and search sessions’ content. Then, tools like association and clustering are applied to look for patterns, and lastly, these patterns are saved to be further analyzed. [2] The tool of association rule mining helps with finding “patterns, associations, and correlations” among pages users visit in a search session. Sequential pattern discovery is association rule mining, but it also accounts for time like the page views in an allotted time period. [2] Classification is a tool that allows for pages to be added to groups representing certain similar qualities. [2]
Some examples of tools individuals can use when conducting click analytics are the Google Analytics tool In-Page Analytics, ClickHeat, and Crazy Egg. [14] These tools create a visual from user click data on a webpage. [14] ClickHeat and Crazy Egg showcase the density of user clicks using specific colors, and all of these tools allow for webpage visitors to be categorized into groups by qualities like being a mobile user or using a particular browser. The specific groups' data can be analyzed for further insight. [14]
One of the main factors users consider when clicking links is a link's position in a list of results. The closer links are to the top, the more likely they are to be selected by users. [15] When users have a personal connection to a subject matter they tend to click that article more frequently. Pictures, position, and specific individuals in the news content also more heavily influenced users’ decisions. The source of the news was deemed as less important. [15]
Click attitude and click intention play a large role in user click behavior. [8] In one study when research participants were presented with positive and negative insurance advertisement photographs, emotion was seen to have a positive association with click intention and click attitude. The researchers also observed that click attitude affects click intention, and positive emotion has more of an impact than negative emotion on click attitude. [8]
The internet can be considered a risky environment due to the abundance of cybersecurity attacks that can occur and the prevalence of malware. Hence, whenever individuals use the internet, they have to decide whether or not to click on the various links. [4] A 2018 study found that users tend to click on more URLs on websites they are familiar with; this user trait is then exploited by cybercriminals, and personal information can be compromised. Hence, trust is seen to also increase click-through intention. [4] When given Google Chrome warnings, 70% of the time people will click through. They also tend to adjust default computer settings in this process. [4] Users were also found to better recognize malware risks when there is a greater potential for revealing their personal information. [4]
Pages that are viewed by users during a particular search session constitute click data. [9] Such data can be used to improve search results in two ways, as explicit and implicit feedback. Explicit feedback is when users indicate which pages are relevant to their search query, while implicit feedback is when user behavior is interpreted to determine results’ relevance. Certain user actions on a webpage that can be used as a part of the interpretation process include bookmarking, saving, or printing a particular web page. [9] Through collecting click data from a few individuals, the relevance of results for all users for given queries can improve. In a search session, a user indicates which documents they are more interested in with their clicks, and this indicates what is relevant to the search. The most relevant click data to determine relevance of results is often the last viewed web page rather than all of the pages clicked on in a search session. Click data outside of search sessions can also be used to improve the accuracy of relevant results for users. [9]
The search results to a given query are usually subject to positional bias. [16] This is because users tend to select links that are at the top of result lists. However, this position does not mean a result is the most relevant since relevance can change over time. As a part of a machine learning approach to improving the result order, human editors begin by supplying an original rank for each result to the algorithm. Then, live user click feedback in the form of tracked click-through rates (CTR) in search sessions can be used to rerank the results based on the data. [9] This improves the order of the results based on the live indicated relevance from the users. [16]
Click dwell time and click sequence information can also be used to improve the relevance of search results. [17] Click dwell time is how long a user takes to return to the search engine results page (SERP) after clicking on a particular result, and this can indicate how satisfied the user is with a particular result. [17] Eye-tracking research indicates that users exhibit an abundance of non-sequential viewing activity when looking at search results. [17] Click models that abide by “top-down” user click behavior cannot interpret the user process of revisiting pages. [17]
Supply-demand mismatch costs can be reduced through click tracking. [18] Huang et al. defines strategic customers as “forward looking” individuals who know that their clicks are being tracked and expect that companies will engage in appropriate business activities. In the conducted study, researchers used clickstream data from customers to observe their preferences and desired product quantities. Noisy clicks are when customers click but do not actually buy the product. This leads to imperfect advanced demand information or ADI. [18]
Click tracking can be used in the realm of advertising, but there is the potential for this tool to be used negatively. Publishers display advertisements on their websites, and they receive money depending on the amount of traffic, measured as a number of clicks, they send to the advertisers website. [7] Click fraud is when publishers fake clicks to generate revenue for themselves. In the 2012 Fraud Detection in Mobile Advertising (FDMA) conference, competition teams were tasked with having to use data mining and machine learning techniques to determine “fraudulent publishers” from a given dataset. [7] A successful algorithm is able to observe and use morning and night click traffic patterns. When there is density of clicks between these main patterns, it is often an indicator of a fraudulent publisher. [7]
Website content can be adjusted to make it specific to users using “user navigational behavior” and user interests in a process called web personalization. [2] Web personalization is useful in the realm of e-commerce. There are unique steps in the process of web personalization, and the first step is noted as “user profiling.” [2] In this step, the user is understood and constituted through their click behavior, preferences, and qualities. Following user profiling is “log analysis and web usage mining.” [2]
Phishing is usually administered through emails, and when a user clicks on a phishing attempt email, their information will be leaked to particular websites. [19] Spear-phishing is a more “targeted” form of phishing in which user information is used to personalize emails and entice users to click. [19] Some phishing emails will also contain other links and attachments. Once these are either clicked or downloaded, users’ privacy can be encroached. Lin et al. conducted a study to see which psychological “weapons of influence” and “life domains” affect users most in phishing attempts, and they found that scarcity was the most influential factor weapon of influence, and the legal domain was the most influential life domain. [19] Age is also an important factor in determining those who are more susceptible to clicking on phishing attempts. [19]
When a virus infects a computer, it finds email addresses and sends copies of itself through these emails. These emails will usually contain an attachment and will be sent to several individuals. [20] This differs from user email account behavior because users tend to have a particular network they communicate with regularly. [20] Researchers studied how the Email Mining Toolkit (EMT) could be used to detect viruses by studying such user email account behavior and found that it was easier to decipher quick, broad viral propagations in comparison to slow, gradual viral propagations. [20]
In order to know what emails users have opened, email senders engage in email tracking. [21] By merely opening an email, users' email addresses can be leaked to third parties, and if users click on links within the emails, their email address can get leaked to a larger number of third parties. [21] Also, each time a user opens an email sent to them, their information can get sent to a new third party among those that their address has already been leaked to. Many third party email trackers are also involved in web tracking, leading to further user profiling. [21]
Privacy-protection models anonymize data after it is sent to a server and stored in a database. [6] Hence, user personal identification information is still collected, and this collection process is based on users trusting such servers. Researchers study giving users control over what information is sent from their mobile devices. They also observe giving users control over how that information is represented in databases in the realm of trajectory data, and they create a system that allows for this approach. This approach gives users the potential to increase their privacy. [6]
When user privacy is going to be encroached, consent forms are often distributed. The type of user activity required in these forms can have an effect on how much information a user retains from the form. [5] Karegar et al. compares the simple agree/disagree format with forms that incorporate checkboxes, drag and drop (DAD), and swipe features. When testing what information users would agree to disclose with each of the consent form formats, researchers observed that users presented with DAD forms had a greater number of eye-fixations and on the given consent form. [5]
When a third-party is associated with a first-party website or mobile application, anytime a user visits the first party website or mobile application, their information will be sent to the third-party. [22] Third-party tracking generates more privacy concerns than first-party tracking because it allows for many website or application records about a particular user to be combined, yielding better user profiles. [22] Binns et al. found that among 5000 popular websites, the top two websites alone had 2000 trackers. Of the 2000 embedded trackers, 253 were used in 25 other websites. [22] Researchers evaluated the reach of third-party trackers based on their contact with users rather than websites, so more "popular" trackers were those who received information about the highest number of people rather than code embedded in the most first-parties. [22] Google and Facebook were deemed as the first and second largest web trackers, and Google and Twitter were deemed as the first and second largest mobile trackers. [22]
Keystroke logging, often referred to as keylogging or keyboard capturing, is the action of recording (logging) the keys struck on a keyboard, typically covertly, so that a person using the keyboard is unaware that their actions are being monitored. Data can then be retrieved by the person operating the logging program. A keystroke recorder or keylogger can be either software or hardware.
Personal information management (PIM) is the study and implementation of the activities that people perform in order to acquire or create, store, organize, maintain, retrieve, and use informational items such as documents, web pages, and email messages for everyday use to complete tasks and fulfill a person's various roles ; it is information management with intrapersonal scope. Personal knowledge management is by some definitions a subdomain.
Phishing is a form of social engineering and scam where attackers deceive people into revealing sensitive information or installing malware such as ransomware. Phishing attacks have become increasingly sophisticated and often transparently mirror the site being targeted, allowing the attacker to observe everything while the victim is navigating the site, and transverse any additional security boundaries with the victim. As of 2020, it is the most common type of cybercrime, with the FBI's Internet Crime Complaint Center reporting more incidents of phishing than any other type of computer crime.
A recommender system, or a recommendation system, is a subclass of information filtering system that provides suggestions for items that are most pertinent to a particular user. Recommender systems are particularly useful when an individual needs to choose an item from a potentially overwhelming number of items that a service may offer.
Internet privacy involves the right or mandate of personal privacy concerning the storage, re-purposing, provision to third parties, and display of information pertaining to oneself via the Internet. Internet privacy is a subset of data privacy. Privacy concerns have been articulated from the beginnings of large-scale computer sharing and especially relate to mass surveillance.
Online advertising, also known as online marketing, Internet advertising, digital advertising or web advertising, is a form of marketing and advertising that uses the Internet to promote products and services to audiences and platform users. Online advertising includes email marketing, search engine marketing (SEM), social media marketing, many types of display advertising, and mobile advertising. Advertisements are increasingly being delivered via automated software systems operating across multiple websites, media services and platforms, known as programmatic advertising.
HTTP cookies are small blocks of data created by a web server while a user is browsing a website and placed on the user's computer or other device by the user's web browser. Cookies are placed on the device used to access a website, and more than one cookie may be placed on a user's device during a session.
A click path or clickstream is the sequence of hyperlinks one or more website visitors follows on a given site, presented in the order viewed. A visitor's click path may start within the website or at a separate third party website, often a search engine results page, and it continues as a sequence of successive webpages visited by the user. Click paths take call data and can match it to ad sources, keywords, and/or referring domains, in order to capture data.
A device fingerprint or machine fingerprint is information collected about the software and hardware of a remote computing device for the purpose of identification. The information is usually assimilated into a brief identifier using a fingerprinting algorithm. A browser fingerprint is information collected specifically by interaction with the web browser of the device.
Targeted advertising is a form of advertising, including online advertising, that is directed towards an audience with certain traits, based on the product or person the advertiser is promoting.
Collaborative search engines (CSE) are Web search engines and enterprise searches within company intranets that let users combine their efforts in information retrieval (IR) activities, share information resources collaboratively using knowledge tags, and allow experts to guide less experienced people through their searches. Collaboration partners do so by providing query terms, collective tagging, adding comments or opinions, rating search results, and links clicked of former (successful) IR activities to users having the same or a related information need.
Web tracking is the practice by which operators of websites and third parties collect, store and share information about visitors' activities on the World Wide Web. Analysis of a user's behaviour may be used to provide content that enables the operator to infer their preferences and may be of interest to various parties, such as advertisers. Web tracking can be part of visitor management.
Evercookie is a JavaScript application programming interface (API) that identifies and reproduces intentionally deleted cookies on the clients' browser storage. It was created by Samy Kamkar in 2010 to demonstrate the possible infiltration from the websites that use respawning. Websites that have adopted this mechanism can identify users even if they attempt to delete the previously stored cookies.
Mouse tracking is the use of software to collect users' mouse cursor positions on the computer. This goal is to automatically gather richer information about what people are doing, typically to improve the design of an interface. Often this is done on the Web and can supplement eye tracking in some situations.
In web analytics, a session, or visit is a unit of measurement of a user's actions taken within a period of time or with regard to completion of a task. Sessions are also used in operational analytics and provision of user-specific recommendations. There are two primary methods used to define a session: time-oriented approaches based on continuity in user activity and navigation-based approaches based on continuity in a chain of requested pages.
Social media mining is the process of obtaining data from user-generated content on social media in order to extract actionable patterns, form conclusions about users, and act upon the information. Mining supports targeting advertising to users or academic research. The term is an analogy to the process of mining for minerals. Mining companies sift through raw ore to find the valuable minerals; likewise, social media mining sifts through social media data in order to discern patterns and trends about matters such as social media usage, online behaviour, content sharing, connections between individuals, buying behaviour. These patterns and trends are of interest to companies, governments and not-for-profit organizations, as such organizations can use the analyses for tasks such as design strategies, introduce programs, products, processes or services.
Dataveillance is the practice of monitoring and collecting online data as well as metadata. The word is a portmanteau of data and surveillance. Dataveillance is concerned with the continuous monitoring of users' communications and actions across various platforms. For instance, dataveillance refers to the monitoring of data resulting from credit card transactions, GPS coordinates, emails, social networks, etc. Using digital media often leaves traces of data and creates a digital footprint of our activity. Unlike sousveillance, this type of surveillance is not often known and happens discreetly. Dataveillance may involve the surveillance of groups of individuals. There exist three types of dataveillance: personal dataveillance, mass dataveillance, and facilitative mechanisms.
Social navigation is a form of social computing introduced by Paul Dourish and Matthew Chalmers in 1994, who defined it as when "movement from one item to another is provoked as an artifact of the activity of another or a group of others". According to later research in 2002, "social navigation exploits the knowledge and experience of peer users of information resources" to guide users in the information space, and that it is becoming more difficult to navigate and search efficiently with all the digital information available from the World Wide Web and other sources. Studying others' navigational trails and understanding their behavior can help improve one's own search strategy by guiding them to make more informed decisions based on the actions of others.
Search engine privacy is a subset of internet privacy that deals with user data being collected by search engines. Both types of privacy fall under the umbrella of information privacy. Privacy concerns regarding search engines can take many forms, such as the ability for search engines to log individual search queries, browsing history, IP addresses, and cookies of users, and conducting user profiling in general. The collection of personally identifiable information (PII) of users by search engines is referred to as tracking.
Spy pixels or tracker pixels are hyperlinks to remote image files in HTML email messages that have the effect of spying on the person reading the email if the image is downloaded. They are commonly embedded in the HTML of an email as small, imperceptible, transparent graphic files. Spy pixels are commonly used in marketing, and there are several countermeasures in place that aim to block email tracking pixels. However, there are few regulations in place that effectively guard against email tracking approaches.