Referrer spam

Last updated

Referrer spam (also known as referral spam, log spam or referrer bombing) is a kind of spamdexing (spamming aimed at search engines). The technique involves making repeated web site requests using a fake referrer URL to the site the spammer wishes to advertise. [1] Sites that publish their access logs, including referrer statistics, will then inadvertently link back to the spammer's site. These links will be indexed by search engines as they crawl the access logs, improving the spammer's search engine ranking. [2]

Contents

At least since 2014, a new variation of this form of spam occurs on Google Analytics. Spammers send fake visits to Google Analytics, often without ever accessing the affected site. The technique is used to have the spammers' URLs appear in the site statistics, inducing the site owner to visit the spam URLs. When it is the case that the spammer has never visited the affected site, the fake visits are also called Ghost Spam. [2]

Mitigations

Techniques for mitigating referrer spam include blocking spam crawlers and filtering out known spam domains in analytics software. [3] The open-source analytics company Matomo maintains a public domain crowdsourced list of spam-associated domains which it uses in automatic filters. [4]

See also

Notes

  1. Pollitt, Michael (2005-08-24). "Moral maze". The Guardian . Archived from the original on 2014-09-19. Retrieved 2022-10-06.
  2. 1 2 "Referral spam: attack patterns and countermeasures". IONOS Digital Guide. Retrieved 2023-05-14.
  3. "How to Block WordPress Referrer Spam in Google Analytics". www.wpbeginner.com. 2022-08-28. Retrieved 2023-05-14.
  4. Team, Matomo Core (2015-05-13). "Stopping Referrer Spam". Analytics Platform - Matomo. Retrieved 2023-05-14.

Related Research Articles

Spamdexing is the deliberate manipulation of search engine indexes. It involves a number of methods, such as link building and repeating unrelated phrases, to manipulate the relevance or prominence of resources indexed in a manner inconsistent with the purpose of the indexing system.

Search engine optimization (SEO) is the process of improving the quality and quantity of website traffic to a website or a web page from search engines. SEO targets unpaid traffic rather than direct traffic or paid traffic. Unpaid traffic may originate from different kinds of searches, including image search, video search, academic search, news search, and industry-specific vertical search engines.

<span class="mw-page-title-main">Link farm</span> Group of websites that link to each other

On the World Wide Web, a link farm is any group of websites that all hyperlink to other sites in the group for the purpose of increasing SEO rankings. In graph theoretic terms, a link farm is a clique. Although some link farms can be created by hand, most are created through automated programs and services. A link farm is a form of spamming the index of a web search engine. Other link exchange systems are designed to allow individual websites to selectively exchange links with other relevant websites and are not considered a form of spamdexing.

Spam in blogs is a form of spamdexing.. It may be done by posting random comments on other blog websites, or by copying other websites' content and using it on free-to-use publishing services like Blogger and WordPress or publicly accessible wikis, digital guest books, and internet forums.

<span class="mw-page-title-main">Metasearch engine</span> Online information retrieval tool

A metasearch engine is an online information retrieval tool that uses the data of a web search engine to produce its own results. Metasearch engines take input from a user and immediately query search engines for results. Sufficient data is gathered, ranked, and presented to the users.

Doorway pages are web pages that are created for the deliberate manipulation of search engine indexes (spamdexing). A doorway page will affect the index of a search engine by inserting results for particular phrases while sending visitors to a different page. Doorway pages that redirect visitors without their knowledge use some form of cloaking. This usually falls under Black Hat SEO.

Keyword stuffing is a search engine optimization (SEO) technique, considered webspam or spamdexing, in which keywords are loaded into a web page's meta tags, visible content, or backlink anchor text in an attempt to gain an unfair rank advantage in search engines. Keyword stuffing may lead to a website being temporarily or permanently banned or penalized on major search engines. The repetition of words in meta tags may explain why many search engines no longer use these tags. Nowadays, search engines focus more on the content that is unique, comprehensive, relevant, and helpful that overall makes the quality better which makes keyword stuffing useless, but it is still practiced by many webmasters.

Email harvesting or scraping is the process of obtaining lists of email addresses using various methods. Typically these are then used for bulk email or spam.

TrustRank is an algorithm that conducts link analysis to separate useful webpages from spam and helps search engine rank pages in SERPs. It is semi-automated process which means that it needs some human assistance in order to function properly. Search engines have many different algorithms and ranking factors that they use when measuring the quality of webpages. TrustRank is one of them.

A spam blog, also known as an auto blog or the neologism splog, is a blog which the author uses to promote affiliated websites, to increase the search engine rankings of associated sites or to simply sell links/ads.

A scraper site is a website that copies content from other websites using web scraping. The content is then mirrored with the goal of creating revenue, usually through advertising and sometimes by selling user data. Scraper sites come in various forms. Some provide little, if any material or information, and are intended to obtain user information such as e-mail addresses, to be targeted for spam e-mail. Price aggregation and shopping sites access multiple listings of a product and allow a user to rapidly compare the prices.

<span class="mw-page-title-main">Google Analytics</span> Web analytics service from Google

Google Analytics is a web analytics service offered by Google that tracks and reports website traffic and also the mobile app traffic & events, currently as a platform inside the Google Marketing Platform brand. Google launched the service in November 2005 after acquiring Urchin.

URL shortening is a technique on the World Wide Web in which a Uniform Resource Locator (URL) may be made substantially shorter and still direct to the required page. This is achieved by using a redirect which links to the web page that has a long URL. For example, the URL "https://example.com/assets/category_B/subcategory_C/Foo/" can be shortened to "https://example.com/Foo", and the URL "https://en.wikipedia.org/wiki/URL_shortening" can be shortened to "https://w.wiki/U". Often the redirect domain name is shorter than the original one. A friendly URL may be desired for messaging technologies that limit the number of characters in a message, for reducing the amount of typing required if the reader is copying a URL from a print source, for making it easier for a person to remember, or for the intention of a permalink. In November 2009, the shortened links of the URL shortening service Bitly were accessed 2.1 billion times.

<span class="mw-page-title-main">HTTP referer</span> HTTP header field

In HTTP, "Referer" is an optional HTTP header field that identifies the address of the web page, from which the resource has been requested. By checking the referrer, the server providing the new web page can see where the request originated.

Website spoofing is the act of creating a website with the intention of misleading readers that the website has been created by a different person or organization. Normally, the spoof website will adopt the design of the target website, and it sometimes has a similar URL. A more sophisticated attack results in an attacker creating a "shadow copy" of the World Wide Web by having all of the victim's traffic go through the attacker's machine, causing the attacker to obtain the victim's sensitive information.

Adversarial information retrieval is a topic in information retrieval related to strategies for working with a data source where some portion of it has been manipulated maliciously. Tasks can include gathering, indexing, filtering, retrieving and ranking information from such a data source. Adversarial IR includes the study of methods to detect, isolate, and defeat such manipulation.

A content farm or content mill, is a company that employs large numbers of freelance writers, or AI tools to generate a large amount of textual web content which is specifically designed to satisfy algorithms for maximal retrieval by automated search engines, known as SEO. Their main goal is to generate advertising revenue through attracting reader page views, as first exposed in the context of social spam.

Social spam is unwanted spam content appearing on social networking services, social bookmarking sites, and any website with user-generated content. It can be manifested in many ways, including bulk messages, profanity, insults, hate speech, malicious links, fraudulent reviews, fake friends, and personally identifiable information.

Google Penguin was a codename for a Google algorithm update that was first announced on April 24, 2012. The update was aimed at decreasing search engine rankings of websites that violate Google's Webmaster Guidelines by using now declared Grey Hat SEM techniques involved in increasing artificially the ranking of a webpage by manipulating the number of links pointing to the page. Such tactics are commonly described as link schemes. According to Google's John Mueller, as of 2013, Google announced all updates to the Penguin filter to the public.

SmartScreen is a cloud-based anti-phishing and anti-malware component included in several Microsoft products, including operating systems Windows 8 and later, the applications Internet Explorer, Microsoft Edge. SmartScreen intelligence is also used in the backend of Microsoft's online services such as the web app Outlook.com and Microsoft Bing search engine.