Referrer spam (also known as referral spam, log spam or referrer bombing) is a kind of spamdexing (spamming aimed at search engines). The technique involves making repeated web site requests using a fake referrer URL to the site the spammer wishes to advertise. [1] Sites that publish their access logs, including referrer statistics, will then inadvertently link back to the spammer's site. These links will be indexed by search engines as they crawl the access logs, improving the spammer's search engine ranking. [2]
At least since 2014, a new variation of this form of spam occurs on Google Analytics. Spammers send fake visits to Google Analytics, often without ever accessing the affected site. The technique is used to have the spammers' URLs appear in the site statistics, inducing the site owner to visit the spam URLs. If the spammer has never visited the affected site, the fake visits are also called ghost spam. [2]
Techniques for mitigating referrer spam include blocking spam crawlers and filtering out known spam domains in analytics software. [3] The open-source analytics company Matomo maintains a public domain crowdsourced list of spam-associated domains which it uses in automatic filters. [4]
Spamdexing is the deliberate manipulation of search engine indexes. It involves a number of methods, such as link building and repeating related and/or unrelated phrases, to manipulate the relevance or prominence of resources indexed in a manner inconsistent with the purpose of the indexing system.
A Domain Name System blocklist, Domain Name System-based blackhole list, Domain Name System blacklist (DNSBL) or real-time blackhole list (RBL) is a service for operation of mail servers to perform a check via a Domain Name System (DNS) query whether a sending host's IP address is blacklisted for email spam. Most mail server software can be configured to check such lists, typically rejecting or flagging messages from such sites.
On the World Wide Web, a link farm is any group of websites that all hyperlink to other sites in the group for the purpose of increasing SEO rankings. In graph theoretic terms, a link farm is a clique. Although some link farms can be created by hand, most are created through automated programs and services. A link farm is a form of spamming the index of a web search engine. Other link exchange systems are designed to allow individual websites to selectively exchange links with other relevant websites, and are not considered a form of spamdexing.
Spam in blogs is a form of spamdexing which utilizes internet sites that allow content to be publicly posted, in order to artificially inflate their website ranking by linking back to their web pages. Backlinking helps search algorithms determine the popularity of a web page, which plays a major role for search engines like Google and Microsoft Bing to decide a web page ranking on a certain search query. This helps the spammer's website to list ahead of other sites for certain searches, which helps them to increase the number of visitors to their website.
A metasearch engine is an online information retrieval tool that uses the data of a web search engine to produce its own results. Metasearch engines take input from a user and immediately query search engines for results. Sufficient data is gathered, ranked, and presented to the users.
Doorway pages are web pages that are created for the deliberate manipulation of search engine indexes (spamdexing). A doorway page will affect the index of a search engine by inserting results for particular phrases while sending visitors to a different page. Doorway pages that redirect visitors without their knowledge use some form of cloaking. This usually falls under Black Hat SEO.
URL redirection, also called URL forwarding, is a World Wide Web technique for making a web page available under more than one URL address. When a web browser attempts to open a URL that has been redirected, a page with a different URL is opened. Similarly, domain redirection or domain forwarding is when all pages in a URL domain are redirected to a different domain, as when wikipedia.com and wikipedia.net are automatically redirected to wikipedia.org.
The anchor text, link label, or link text is the visible, clickable text in an HTML hyperlink. The term "anchor" was used in older versions of the HTML specification for what is currently referred to as the "a element", or <a>
. The HTML specification does not have a specific term for anchor text, but refers to it as "text that the a element wraps around". In XML terms, the anchor text is the content of the element, provided that the content is text.
TrustRank is an algorithm that conducts link analysis to separate useful webpages from spam and helps search engine rank pages in SERPs. It is semi-automated process which means that it needs some human assistance in order to function properly. Search engines have many different algorithms and ranking factors that they use when measuring the quality of webpages. TrustRank is one of them.
A spam blog, also known as an auto blog or the neologism splog, is a blog which the author uses to promote affiliated websites, to increase the search engine rankings of associated sites or to simply sell links/ads.
A scraper site is a website that copies content from other websites using web scraping. The content is then mirrored with the goal of creating revenue, usually through advertising and sometimes by selling user data.
Google Analytics is a web analytics service offered by Google that tracks and reports website traffic and also mobile app traffic & events, currently as a platform inside the Google Marketing Platform brand. Google launched the service in November 2005 after acquiring Urchin.
URL shortening is a technique on the World Wide Web in which a Uniform Resource Locator (URL) may be made substantially shorter and still direct to the required page. This is achieved by using a redirect which links to the web page that has a long URL. For example, the URL "https://en.wikipedia.org/wiki/URL_shortening" can be shortened to "https://w.wiki/U". Often the redirect domain name is shorter than the original one. A friendly URL may be desired for messaging technologies that limit the number of characters in a message, for reducing the amount of typing required if the reader is copying a URL from a print source, for making it easier for a person to remember, or for the intention of a permalink. In November 2009, the shortened links of the URL shortening service Bitly were accessed 2.1 billion times.
In HTTP networking, typically on the World Wide Web, referer spoofing sends incorrect referer information in an HTTP request in order to prevent a website from obtaining accurate data on the identity of the web page previously visited by the user.
A search engine is a software system that provides hyperlinks to web pages and other relevant information on the Web in response to a user's query. The user inputs a query within a web browser or a mobile app, and the search results are often a list of hyperlinks, accompanied by textual summaries and images. Users also have the option of limiting the search to a specific type of results, such as images, videos, or news.
In HTTP, "Referer" is an optional HTTP header field that identifies the address of the web page from which the resource has been requested. By checking the referrer, the server providing the new web page can see where the request originated.
Website spoofing is the act of creating a website with the intention of misleading readers that the website has been created by a different person or organization. Normally, the spoof website will adopt the design of the target website, and it sometimes has a similar URL. A more sophisticated attack results in an attacker creating a "shadow copy" of the World Wide Web by having all of the victim's traffic go through the attacker's machine, causing the attacker to obtain the victim's sensitive information.
Adversarial information retrieval is a topic in information retrieval related to strategies for working with a data source where some portion of it has been manipulated maliciously. Tasks can include gathering, indexing, filtering, retrieving and ranking information from such a data source. Adversarial IR includes the study of methods to detect, isolate, and defeat such manipulation.
A content farm or content mill is a company that employs freelance creators or uses automated tools to generate a large amount of web content which is specifically designed to satisfy algorithms for maximal retrieval by search engines, known as SEO. Their main goal is to generate advertising revenue through attracting page views, as first exposed in the context of social spam.
Social spam is unwanted spam content appearing on social networking services, social bookmarking sites, and any website with user-generated content. It can be manifested in many ways, including bulk messages, profanity, insults, hate speech, malicious links, fraudulent reviews, fake friends, and personally identifiable information.