Article spinning

Last updated

Article spinning is a writing technique used to deceitfully create what appears to be new content from pre-existing works. It is commonly used in on the internet by websites as a method of search engine optimization (SEO) and by students as a form of plagiarism. Content spinning works by replacing specific words, phrases, sentences, or even entire paragraphs with any number of alternate versions, in order to provide a slightly different variation with each spin — also known as Rogeting. This process can be completely automated or written manually as many times as needed. Early content produced through automated methods often resulted in articles which were hard or even impossible to read. However, as article-spinning techniques were refined they became more sophisticated, and can now result in readable articles which, upon cursory review, can appear original.

Contents

The practice is sometimes considered to fall under the category of spamdexing, a black hat SEO practice, given that no genuinely new content is created. Website authors use article spinning to reduce the similarity ratio of rather redundant pages or pages with minimal or meaningless or uninformative content, and to avoid penalties in the search engine results pages (SERPs) for using duplicate content.

Article spinning is also used in other types of applications, such as message personalization and chatbots.

Regardless of the application, the result is a proliferation of documents that are all similar but are superficially disguised as being different. The spin-generated documents can prove uninformative to the reader, thereby infuriating the end user. [1]

Automatic spinning

Automatic rewriting can change the meaning of a sentence through the use of words with similar but subtly different meanings to the original. For example, the word "picture" could be replaced by the word "image" or "photo". Thousands of word-for-word combinations are stored in either a text file or database thesaurus to draw from. This ensures that a large percentage of words are different from the original article.[ citation needed ]

The problem with simple automatic writing is that it cannot recognize context or grammar in the use of words and phrases. Poorly-done article spinning can result in unidiomatic phrasing that no human writer would choose. Some spinning may substitute a synonym with the wrong part of speech when encountering a word that can be used as either a noun or a verb, use an obscure word that is only used within very specific contexts, or improperly substitute proper nouns. For example, "Great Britain" could be auto-spun to "Good Britain". While "good" could be considered a synonym for "great", "Good Britain" does not have the same meaning as "Great Britain".[ citation needed ]

Article spinning can use a variety of methods; a straightforward one is "spintax". Spintax (or spin syntax) uses a marked-up version of text to indicate which parts of the text should be altered or rearranged. The different variants of one paragraph, one or several sentences, or groups of words or words are marked. This spintax can be extremely rich and complex, with many depth levels (nested spinning). It acts as a tree with large branches, then many smaller branches up to the leaves. To create readable articles out of spintax, a specific software application chooses any of the possible paths in the tree; this results in wide variations of the base article without significant alteration to its meaning.[ citation needed ]

As of 2017, there are a number of websites which will automatically spin content for an author, often with the end goal of attracting viewers to a website in order to display advertisements to them.[ citation needed ]

Manual spinning

Because of the problems with automated spinning, website owners may pay writers or specific companies to perform higher quality spinning manually. Writers may also spin their own articles, allowing them to sell the same articles with slight variations to a number of clients or to use the article for multiple purposes, for example as content and also for article marketing.

Plagiarism and duplicate content

In academia, article spinning is sometimes used by students as a way to plagiarise other people's work while evading detection from their teachers or automated checking devices such as Turnitin or its IThenticate system. [2] There are many websites offering text-spinning services to students. Unlike large language models, they are not designed to produce natural sounding writing; rather, they are designed to take a source text and preserve the meaning and structure, but swap out enough synonyms such that plagiarism remains undetected. [3]

Google representatives say that Google doesn't penalize websites that host duplicate content, but the advances in filtering techniques mean that duplicate content will rarely feature well in SERPs, which is a form of penalty. [4] In 2010 and 2011, changes to Google's search algorithm targeting content farms aim to penalize sites containing significant duplicate content. [5] In this context, article spinning might help, as it's not detected as duplicate content.

Criticisms

Article spinning is a way to create what looks like new content from existing content. As such, it can be seen as unethical, whether it is paraphrasing of copyrighted material (to try to evade copyright), deceiving readers into wasting their time for the benefit of the spinner (while not providing additional value to them), or both. [6]

Related Research Articles

<span class="mw-page-title-main">Google Search</span> Search engine from Google

Google Search is a search engine operated by Google. It allows users to search for information on the Web by entering keywords or phrases. Google Search uses algorithms to analyze and rank websites based on their relevance to the search query. It is the most popular search engine worldwide.

Spamdexing is the deliberate manipulation of search engine indexes. It involves a number of methods, such as link building and repeating related and/or unrelated phrases, to manipulate the relevance or prominence of resources indexed in a manner inconsistent with the purpose of the indexing system.

Search engine optimization (SEO) is the process of improving the quality and quantity of website traffic to a website or a web page from search engines. SEO targets unpaid search traffic rather than direct traffic, referral traffic, social media traffic, or paid traffic.

<span class="mw-page-title-main">Metasearch engine</span> Online information retrieval tool

A metasearch engine is an online information retrieval tool that uses the data of a web search engine to produce its own results. Metasearch engines take input from a user and immediately query search engines for results. Sufficient data is gathered, ranked, and presented to the users.

A paraphrase or rephrase is the rendering of the same text in different words without losing the meaning of the text itself. More often than not, a paraphrased text can convey its meaning better than the original words. In other words, it is a copy of the text in meaning, but which is different from the original. For example, when someone tells a story they heard, in their own words, they paraphrase, with the meaning being the same. The term itself is derived via Latin paraphrasis, from Ancient Greek παράφρασις (paráphrasis) 'additional manner of expression'. The act of paraphrasing is also called paraphrasis.

Pay-per-click (PPC) is an internet advertising model used to drive traffic to websites, in which an advertiser pays a publisher when the ad is clicked.

The anchor text, link label, or link text is the visible, clickable text in an HTML hyperlink. The term "anchor" was used in older versions of the HTML specification for what is currently referred to as the "a element", or <a>. The HTML specification does not have a specific term for anchor text, but refers to it as "text that the a element wraps around". In XML terms, the anchor text is the content of the element, provided that the content is text.

Search engine marketing (SEM) is a form of Internet marketing that involves the promotion of websites by increasing their visibility in search engine results pages (SERPs) primarily through paid advertising. SEM may incorporate search engine optimization (SEO), which adjusts or rewrites website content and site architecture to achieve a higher ranking in search engine results pages to enhance pay per click (PPC) listings and increase the Call to action (CTA) on the website.

A wordfilter is a script typically used on Internet forums or chat rooms that automatically scans users' posts or comments as they are submitted and automatically changes or censors particular words or phrases.

A statistically improbable phrase (SIP) is a phrase or set of words that occurs more frequently in a document than in some larger corpus. Amazon.com uses this concept in determining keywords for a given book or chapter, since keywords of a book or chapter are likely to appear disproportionately within that section. Christian Rudder has also used this concept with data from online dating profiles and Twitter posts to determine the phrases most characteristic of a given race or gender in his book Dataclysm. SIPs with a linguistic density of two or three words, adjective, adjective, noun or adverb, adverb, verb, will signal the author's attitude, premise or conclusions to the reader or express an important idea.

A video search engine is a web-based search engine which crawls the web for video content. Some video search engines parse externally hosted content while others allow content to be uploaded and hosted on their own servers. Some engines also allow users to search by video format type and by length of the clip. The video search results are usually accompanied by a thumbnail view of the video.

A search engine results page (SERP) is a webpage that is displayed by a search engine in response to a query by a user. The main component of a SERP is the listing of results that are returned by the search engine in response to a keyword query.

Search engine indexing is the collecting, parsing, and storing of data to facilitate fast and accurate information retrieval. Index design incorporates interdisciplinary concepts from linguistics, cognitive psychology, mathematics, informatics, and computer science. An alternate name for the process, in the context of search engines designed to find web pages on the Internet, is web indexing.

Keyword research is a practice search engine optimization (SEO) professionals use to find and analyze search terms that users enter into search engines when looking for products, services, or general information. Keywords are related to search queries.

Subject indexing is the act of describing or classifying a document by index terms, keywords, or other symbols in order to indicate what different documents are about, to summarize their contents or to increase findability. In other words, it is about identifying and describing the subject of documents. Indexes are constructed, separately, on three distinct levels: terms in a document such as a book; objects in a collection such as a library; and documents within a field of knowledge.

Duplicate content is a term used in the field of search engine optimization to describe content that appears on more than one web page. The duplicate content can be substantial parts of the content within or across domains and can be either exactly duplicate or closely similar. When multiple pages contain essentially the same content, search engines such as Google and Bing can penalize or cease displaying the copying site in any relevant search results.

Website correlation, or website matching, is a process used to identify websites that are similar or related. Websites are inherently easy to duplicate. This led to proliferation of identical websites or very similar websites for purposes ranging from translation to Internet marketing to Internet crime Locating similar websites is inherently problematic because they may be in different languages, on different servers, in different countries.

Hummingbird is the codename given to a significant algorithm change in Google Search in 2013. Its name was derived from the speed and accuracy of the hummingbird. The change was announced on September 26, 2013, having already been in use for a month. "Hummingbird" places greater emphasis on natural language queries, considering context and meaning over individual keywords. It also looks deeper at content on individual pages of a website, with improved ability to lead users directly to the most appropriate page rather than just a website's homepage.

RankBrain is a machine learning-based search engine algorithm, the use of which was confirmed by Google on 26 October 2015. It helps Google to process search results and provide more relevant search results for users. In a 2015 interview, Google commented that RankBrain was the third most important factor in the ranking algorithm, after with links and content, out of about 200 ranking factors. whose exact functions in the Google algorithm are not fully disclosed. As of 2015, "RankBrain was used for less than 15% of queries." The results show that RankBrain guesses what the other parts of the Google search algorithm will pick as the top result 80% of the time, compared to 70% for human search engineers.

The domain authority of a website describes its relevance for a specific subject area or industry. Domain Authority is a search engine ranking score developed by Moz. This relevance has a direct impact on its ranking by search engines, trying to assess domain authority through automated analytic algorithms. The relevance of domain authority on website-listing in the Search Engine Results Page (SERPs) of search engines led to the birth of a whole industry of Black-Hat SEO providers, trying to feign an increased level of domain authority. The ranking by major search engines, e.g., Google’s PageRank is agnostic of specific industry or subject areas and assesses a website in the context of the totality of websites on the Internet. The results on the SERP page set the PageRank in the context of a specific keyword. In a less competitive subject area, even websites with a low PageRank can achieve high visibility in search engines, as the highest ranked sites that match specific search words are positioned on the first positions in the SERPs.

References

  1. Gossman, Kathleen (June 15, 2012), Spinning gets you nowhere.
  2. Akbari 2020 , Abstract
  3. Akbari 2020 , Online paraphrasing tools
  4. "Webmaster Help Centre: Little or no original content". Google Inc. Archived from the original on 26 August 2007. Retrieved 2007-09-18.
  5. Sullivan, Danny (2011-02-25). "Google Forecloses On Content Farms With "Panda" Algorithm Update". Search Engine Land. Retrieved 2022-12-07.
  6. Edwards, Suzzane (December 14, 2011). "Eight Good Reasons Why Spinning Articles is Bad for your Website". Search Engine Journal. Retrieved 24 July 2017.

Bibliography