Referrer spam

Last updated
This is an excerpt of a screenshot of Referrer spam in the output of the Webalizer website analytics software. Referrer spam in Webalizer.png
This is an excerpt of a screenshot of Referrer spam in the output of the Webalizer website analytics software.

Referrer spam (also known as referral spam, log spam or referrer bombing) is a kind of spamdexing (spamming aimed at search engines). The technique involves making repeated web site requests using a fake referrer URL to the site the spammer wishes to advertise. [1] Sites that publish their access logs, including referrer statistics, will then inadvertently link back to the spammer's site. These links will be indexed by search engines as they crawl the access logs, improving the spammer's search engine ranking. [2]

Contents

At least since 2014, a new variation of this form of spam occurs on Google Analytics. Spammers send fake visits to Google Analytics, often without ever accessing the affected site. The technique is used to have the spammers' URLs appear in the site statistics, inducing the site owner to visit the spam URLs. When it is the case that the spammer has never visited the affected site, the fake visits are also called Ghost Spam. [2]

Mitigations

Techniques for mitigating referrer spam include blocking spam crawlers and filtering out known spam domains in analytics software. [3] The open-source analytics company Matomo maintains a public domain crowdsourced list of spam-associated domains which it uses in automatic filters. [4]

See also

Notes

  1. Pollitt, Michael (2005-08-24). "Moral maze". The Guardian . Archived from the original on 2014-09-19. Retrieved 2022-10-06.
  2. 1 2 "Referral spam: attack patterns and countermeasures". IONOS Digital Guide. Retrieved 2023-05-14.
  3. "How to Block WordPress Referrer Spam in Google Analytics". www.wpbeginner.com. 2022-08-28. Retrieved 2023-05-14.
  4. Team, Matomo Core (2015-05-13). "Stopping Referrer Spam". Analytics Platform - Matomo. Retrieved 2023-05-14.

Related Research Articles

Spamdexing is the deliberate manipulation of search engine indexes. It involves a number of methods, such as link building and repeating unrelated phrases, to manipulate the relevance or prominence of resources indexed in a manner inconsistent with the purpose of the indexing system.

<span class="mw-page-title-main">Link farm</span> Group of websites that link to each other

On the World Wide Web, a link farm is any group of websites that all hyperlink to other sites in the group for the purpose of increasing SEO rankings. In graph theoretic terms, a link farm is a clique. Although some link farms can be created by hand, most are created through automated programs and services. A link farm is a form of spamming the index of a web search engine. Other link exchange systems are designed to allow individual websites to selectively exchange links with other relevant websites, and are not considered a form of spamdexing.

Various anti-spam techniques are used to prevent email spam.

Spam in blogs is a form of spamdexing which utilizes internet sites that allow content to be publicly posted, in order to artificially inflate their website ranking by linking back to their web pages. Backlinking helps search algorithms determine the popularity of a web page, which plays a major role for search engines like Google and Microsoft Bing to decide a web page ranking on a certain search query. This helps the spammer's website to list ahead of other sites for certain searches, which helps them to increase the number of visitors to their website.

<span class="mw-page-title-main">Metasearch engine</span> Online information retrieval tool

A metasearch engine is an online information retrieval tool that uses the data of a web search engine to produce its own results. Metasearch engines take input from a user and immediately query search engines for results. Sufficient data is gathered, ranked, and presented to the users.

Doorway pages are web pages that are created for the deliberate manipulation of search engine indexes (spamdexing). A doorway page will affect the index of a search engine by inserting results for particular phrases while sending visitors to a different page. Doorway pages that redirect visitors without their knowledge use some form of cloaking. This usually falls under Black Hat SEO.

URL redirection, also called URL forwarding, is a World Wide Web technique for making a web page available under more than one URL address. When a web browser attempts to open a URL that has been redirected, a page with a different URL is opened. Similarly, domain redirection or domain forwarding is when all pages in a URL domain are redirected to a different domain, as when wikipedia.com and wikipedia.net are automatically redirected to wikipedia.org.

The anchor text, link label, or link text is the visible, clickable text in an HTML hyperlink. The term "anchor" was used in older versions of the HTML specification for what is currently referred to as the a element, or <a>. The HTML specification does not have a specific term for anchor text, but refers to it as "text that the a element wraps around". In XML terms, the anchor text is the content of the element, provided that the content is text.

Keyword stuffing is a search engine optimization (SEO) technique, considered webspam or spamdexing, in which keywords are loaded into a web page's meta tags, visible content, or backlink anchor text in an attempt to gain an unfair rank advantage in search engines. Keyword stuffing may lead to a website being temporarily or permanently banned or penalized on major search engines. The repetition of words in meta tags may explain why many search engines no longer use these tags. Nowadays, search engines focus more on the content that is unique, comprehensive, relevant, and helpful that overall makes the quality better which makes keyword stuffing useless, but it is still practiced by many webmasters.

<span class="mw-page-title-main">Text Retrieval Conference</span> Meetings for information retrieval research

The Text REtrieval Conference (TREC) is an ongoing series of workshops focusing on a list of different information retrieval (IR) research areas, or tracks. It is co-sponsored by the National Institute of Standards and Technology (NIST) and the Intelligence Advanced Research Projects Activity, and began in 1992 as part of the TIPSTER Text program. Its purpose is to support and encourage research within the information retrieval community by providing the infrastructure necessary for large-scale evaluation of text retrieval methodologies and to increase the speed of lab-to-product transfer of technology.

TrustRank is an algorithm that conducts link analysis to separate useful webpages from spam and helps search engine rank pages in SERPs. It is semi-automated process which means that it needs some human assistance in order to function properly. Search engines have many different algorithms and ranking factors that they use when measuring the quality of webpages. TrustRank is one of them.

A spam blog, also known as an auto blog or the neologism splog, is a blog which the author uses to promote affiliated websites, to increase the search engine rankings of associated sites or to simply sell links/ads.

A scraper site is a website that copies content from other websites using web scraping. The content is then mirrored with the goal of creating revenue, usually through advertising and sometimes by selling user data.

URL shortening is a technique on the World Wide Web in which a Uniform Resource Locator (URL) may be made substantially shorter and still direct to the required page. This is achieved by using a redirect which links to the web page that has a long URL. For example, the URL "https://en.wikipedia.org/wiki/URL_shortening" can be shortened to "https://w.wiki/U". Often the redirect domain name is shorter than the original one. A friendly URL may be desired for messaging technologies that limit the number of characters in a message, for reducing the amount of typing required if the reader is copying a URL from a print source, for making it easier for a person to remember, or for the intention of a permalink. In November 2009, the shortened links of the URL shortening service Bitly were accessed 2.1 billion times.

In HTTP networking, typically on the World Wide Web, referer spoofing sends incorrect referer information in an HTTP request in order to prevent a website from obtaining accurate data on the identity of the web page previously visited by the user.

<span class="mw-page-title-main">HTTP referer</span> HTTP header field

In HTTP, "Referer" is an optional HTTP header field that identifies the address of the web page, from which the resource has been requested. By checking the referrer, the server providing the new web page can see where the request originated.

Website spoofing is the act of creating a website with the intention of misleading readers that the website has been created by a different person or organization. Normally, the spoof website will adopt the design of the target website, and it sometimes has a similar URL. A more sophisticated attack results in an attacker creating a "shadow copy" of the World Wide Web by having all of the victim's traffic go through the attacker's machine, causing the attacker to obtain the victim's sensitive information.

Adversarial information retrieval is a topic in information retrieval related to strategies for working with a data source where some portion of it has been manipulated maliciously. Tasks can include gathering, indexing, filtering, retrieving and ranking information from such a data source. Adversarial IR includes the study of methods to detect, isolate, and defeat such manipulation.

A content farm or content mill is a company that employs large numbers of freelance writers or uses automated tools to generate a large amount of textual web content which is specifically designed to satisfy algorithms for maximal retrieval by search engines, known as SEO. Their main goal is to generate advertising revenue through attracting reader page views, as first exposed in the context of social spam.

Social spam is unwanted spam content appearing on social networking services, social bookmarking sites, and any website with user-generated content. It can be manifested in many ways, including bulk messages, profanity, insults, hate speech, malicious links, fraudulent reviews, fake friends, and personally identifiable information.