Article spinning

Last updated

Article spinning is a writing technique used in search engine optimization (SEO), and other applications, which creates what deceitfully appears to be new content from what already exists. Content spinning works by replacing specific words, phrases, sentences, or even entire paragraphs with any number of alternate versions, in order to provide a slightly different variation with each spin — also known as Rogeting. This process can be completely automated or written manually as many times as needed. Early content produced through automated methods often resulted in articles which were hard or even impossible to read. However, as article-spinning techniques were refined they became more sophisticated, and can now result in readable articles which, upon cursory review, can appear original.

Contents

The practice is sometimes considered to fall under the category of spamdexing, a black hat SEO practice, given that no genuinely new content is created. Website authors use article spinning to reduce the similarity ratio of rather redundant pages or pages with minimal or meaningless or uninformative content, and to avoid penalties in the search engine results pages (SERPs) for using duplicate content.

Article spinning is also used in other types of applications, such as message personalization and chatbots.

Regardless of the application, the end result is a proliferation of documents that are all similar but are superficially disguised as being different. The spin-generated documents can prove uninformative to the reader, thereby infuriating the end user. [1]

Automatic spinning

Automatic rewriting can change the meaning of a sentence through the use of words with similar but subtly different meanings to the original. For example, the word "picture" could be replaced by the word "image" or "photo". Thousands of word-for-word combinations are stored in either a text file or database thesaurus to draw from. This ensures that a large percentage of words are different from the original article.

The problem with simple automatic writing is that it cannot recognize context or grammar in the use of words and phrases. Poorly-done article spinning can result in unidiomatic phrasing that no human writer would choose. Some spinning may substitute a synonym with the wrong part of speech when encountering a word that can be used as either a noun or a verb, use an obscure word that is only used within very specific contexts, or improperly substitute proper nouns. For example, "Great Britain" could be auto spun to "Good Britain". While "good" could be considered a synonym for "great", "Good Britain" does not have the same meaning as "Great Britain".

Article spinning can use a variety of methods; a straightforward one is "spintax". Spintax (or spin syntax) uses a marked-up version of text to indicate which parts of the text should be altered or rearranged. The different variants of one paragraph, one or several sentences, or groups of words or words are marked. This spintax can be extremely rich and complex, with many depth levels (nested spinning). It acts as a tree with large branches, then many smaller branches up to the leaves. To create readable articles out of spintax, a specific software application chooses any of the possible paths in the tree; this results in wide variations of the base article without significant alteration to its meaning.

As of 2017, there are a number of websites which will automatically spin content for an author, often with the end goal of attracting viewers to a website in order to display advertisements to them.

Manual spinning

Because of the problems with automated spinning, website owners may pay writers or specific companies to perform higher quality spinning manually. Writers may also spin their own articles, allowing them to sell the same articles with slight variations to a number of clients or to use the article for multiple purposes, for example as content and also for article marketing.

Plagiarism and duplicate content

Google representatives say that Google doesn't penalize websites that host duplicate content, but the advances in filtering techniques mean that duplicate content will rarely feature well in SERPs, which is a form of penalty. [2] In 2010 and 2011, changes to Google's search algorithm targeting content farms aim to penalize sites containing significant duplicate content. [3] In this context, article spinning might help, as it's not detected as duplicate content.

Criticisms

Article spinning is a way to create what looks like new content from existing content. As such, it can be seen as unethical, whether it is paraphrasing of copyrighted material (to try to evade copyright), deceiving readers into wasting their time for the benefit of the spinner (while not providing additional value to them), or both. [4]

Related Research Articles

<span class="mw-page-title-main">Google Search</span> Search engine from Google

Google Search is a search engine owned and operated by Google. Handling more than 3.5 billion searches per day, it has a 92% share of the global search engine market. It is the most-visited website in the world. Approximately 26.75% of Google's monthly global traffic comes from the United States, 4.44% from India, 4.4% from Brazil, 3.92% from the United Kingdom and 3.84% from Japan according to data provided by Similarweb.

Spamdexing is the deliberate manipulation of search engine indexes. It involves a number of methods, such as link building and repeating unrelated phrases, to manipulate the relevance or prominence of resources indexed in a manner inconsistent with the purpose of the indexing system.

Search engine optimization (SEO) is the process of improving the quality and quantity of website traffic to a website or a web page from search engines. SEO targets unpaid traffic rather than direct traffic or paid traffic. Unpaid traffic may originate from different kinds of searches, including image search, video search, academic search, news search, and industry-specific vertical search engines.

Text mining, text data mining (TDM) or text analytics is the process of deriving high-quality information from text. It involves "the discovery by computer of new, previously unknown information, by automatically extracting information from different written resources." Written resources may include websites, books, emails, reviews, and articles. High-quality information is typically obtained by devising patterns and trends by means such as statistical pattern learning. According to Hotho et al. (2005) we can distinguish between three different perspectives of text mining: information extraction, data mining, and a knowledge discovery in databases (KDD) process. Text mining usually involves the process of structuring the input text, deriving patterns within the structured data, and finally evaluation and interpretation of the output. 'High quality' in text mining usually refers to some combination of relevance, novelty, and interest. Typical text mining tasks include text categorization, text clustering, concept/entity extraction, production of granular taxonomies, sentiment analysis, document summarization, and entity relation modeling.

Pay-per-click (PPC) is an internet advertising model used to drive traffic to websites, in which an advertiser pays a publisher when the ad is clicked.

<span class="mw-page-title-main">Anchor text</span> Visible, clickable text in a hyperlink

The anchor text, link label or link text is the visible, clickable text in an HTML hyperlink. The term "anchor" was used in older versions of the HTML specification for what is currently referred to as the a element, or <a>. The HTML specification does not have a specific term for anchor text, but refers to it as "text that the a element wraps around". In XML terms, the anchor text is the content of the element, provided that the content is text.

In text processing, a proximity search looks for documents where two or more separately matching term occurrences are within a specified distance, where distance is the number of intermediate words or characters. In addition to proximity, some implementations may also impose a constraint on the word order, in that the order in the searched text must be identical to the order of the search query. Proximity searching goes beyond the simple matching of words by adding the constraint of proximity and is generally regarded as a form of advanced search.

Search engine marketing (SEM) is a form of Internet marketing that involves the promotion of websites by increasing their visibility in search engine results pages (SERPs) primarily through paid advertising. SEM may incorporate search engine optimization (SEO), which adjusts or rewrites website content and site architecture to achieve a higher ranking in search engine results pages to enhance pay per click (PPC) listings and increase the Call to action (CTA) on the website.

A wordfilter is a script typically used on Internet forums or chat rooms that automatically scans users' posts or comments as they are submitted and automatically changes or censors particular words or phrases.

<span class="mw-page-title-main">News aggregator</span> Client software that aggregates syndicated web content

In computing, a news aggregator, also termed a feed aggregator, content aggregator, feed reader, news reader, RSS reader, or simply an aggregator, is client software or a web application that aggregates digital content such as online newspapers, blogs, podcasts, and video blogs (vlogs) in one location for easy viewing. The updates distributed may include journal tables of contents, podcasts, videos, and news items.

A video search engine is a web-based search engine which crawls the web for video content. Some video search engines parse externally hosted content while others allow content to be uploaded and hosted on their own servers. Some engines also allow users to search by video format type and by length of the clip. The video search results are usually accompanied by a thumbnail view of the video.

Search Engine Results Pages (SERP) are the pages displayed by search engines in response to a query by a user. The main component of the SERP is the listing of results that are returned by the search engine in response to a keyword query.

Search engine indexing is the collecting, parsing, and storing of data to facilitate fast and accurate information retrieval. Index design incorporates interdisciplinary concepts from linguistics, cognitive psychology, mathematics, informatics, and computer science. An alternate name for the process, in the context of search engines designed to find web pages on the Internet, is web indexing.

Keyword research is a practice search engine optimization (SEO) professionals use to find and analyze search terms that users enter into search engines when looking for products, services, or general information. Keywords are related to search queries.

Duplicate content is a term used in the field of search engine optimization to describe content that appears on more than one web page. The duplicate content can be substantial parts of the content within or across domains and can be either exactly duplicate or closely similar. When multiple pages contain essentially the same content, search engines such as Google and Bing can penalize or cease displaying the copying site in any relevant search results.

Website correlation, or website matching, is a process used to identify websites that are similar or related. Websites are inherently easy to duplicate. This led to proliferation of identical websites or very similar websites for purposes ranging from translation to Internet marketing to Internet crime Locating similar websites is inherently problematic because they may be in different languages, on different servers, in different countries.

The following outline is provided as an overview of and topical guide to natural-language processing:

Hummingbird is the codename given to a significant algorithm change in Google Search in 2013. Its name was derived from the speed and accuracy of the hummingbird. The change was announced on September 26, 2013, having already been in use for a month. "Hummingbird" places greater emphasis on natural language queries, considering context and meaning over individual keywords. It also looks deeper at content on individual pages of a website, with improved ability to lead users directly to the most appropriate page rather than just a website's homepage.

RankBrain is a machine learning-based search engine algorithm, the use of which was confirmed by Google on 26 October 2015. It helps Google to process search results and provide more relevant search results for users. In a 2015 interview, Google commented that RankBrain was the third most important factor in the ranking algorithm, along with links and content. As of 2015, "RankBrain was used for less than 15% of queries." The results show that RankBrain produces results that are well within 10% of the Google search engine engineer team.

The domain authority of a website describes its relevance for a specific subject area or industry. Domain Authority is a search engine ranking score developed by Moz. This relevance has a direct impact on its ranking by search engines, trying to assess domain authority through automated analytic algorithms. The relevance of domain authority on website-listing in the Search Engine Results Page (SERPs) of search engines led to the birth of a whole industry of Black-Hat SEO providers, trying to feign an increased level of domain authority. The ranking by major search engines, e.g., Google’s PageRank is agnostic of specific industry or subject areas and assesses a website in the context of the totality of websites on the Internet. The results on the SERP page set the PageRank in the context of a specific keyword. In a less competitive subject area, even websites with a low PageRank can achieve high visibility in search engines, as the highest ranked sites that match specific search words are positioned on the first positions in the SERPs.

References

  1. Gossman, Kathleen (June 15, 2012), Spinning gets you nowhere.
  2. "Webmaster Help Centre: Little or no original content". Google Inc. Archived from the original on 26 August 2007. Retrieved 2007-09-18.
  3. Sullivan, Danny (2011-02-25). "Google Forecloses On Content Farms With "Panda" Algorithm Update". Search Engine Land. Retrieved 2022-12-07.
  4. Edwards, Suzzane (December 14, 2011). "Eight Good Reasons Why Spinning Articles is Bad for your Website". Search Engine Journal. Retrieved 24 July 2017.