Content farm

Last updated

A content farm or content mill is a company that employs large numbers of freelance writers or uses automated tools to generate a large amount of textual web content which is specifically designed to satisfy algorithms for maximal retrieval by search engines, known as SEO (search engine optimization). Their main goal is to generate advertising revenue through attracting reader page views, [1] as first exposed in the context of social spam. [2]

Contents

Articles in content farms have been found to contain identical passages across several media sources, leading to questions about the site's placing SEO goals over factual relevance. [3] Proponents of the content farms claim that from a business perspective, traditional journalism is inefficient. [1] Content farms often commission their writers' work based on analysis of search engine queries that proponents represent as "true market demand", a feature that traditional journalism purportedly lacks. [1]

Characteristics

Some sites labeled as content farms may contain many articles and have been valued in the millions of dollars. In 2009, Wired magazine wrote that, according to founder and CEO Richard Rosenblatt of Demand Media (which includes eHow), that "by next summer, Demand will be publishing one million items a month, the equivalent of four English-language Wikipedias a year". [4] Another site, Associated Content, was purchased in May 2010 by Yahoo! for $90 million. [5] However, this new website, which was renamed Yahoo! Voices, was shut down in 2014. [6]

Pay scales for content are low compared to traditional salaries received by writers.[ citation needed ] One company compensated writers at a rate of $3.50 per article.[ citation needed ] Such rates are substantially lower than a typical writer might receive working for mainstream online publications; however, some content farm contributors produce many articles per day and may earn enough for a living. It has been observed that content writers are mostly women with children, English majors, or journalism students seeking supplemental income while working at home. [7]

Since the emergence and popularity of large language models, content farms have started using the tools to automatically generate content without any need for human authors. [8]

AI tools make it easy to fill up sites with massive amounts of content. When quality is not an issue, programs like ChatGPT can produce articles at an unprecedented rate. Google Ads provides 90 percent of the advertisements alongside this content, as large internet companies are willing to sustain this sort of business model. [9]

Criticisms

Critics allege that content farms provide relatively low-quality content, [10] and that they maximize profit by producing "just good enough" material rather than high-quality articles. [11] Articles that are written by human authors (rather than by automated techniques) are often not written by a specialist in the subjects reported. Some authors working for sites identified as content farms have admitted knowing little about the fields on which they report. [12]

Search engines see content farms as a problem, as they tend to bring the user to less relevant and lower quality results of the search. [13] The reduced quality and rapid creation of articles on such sites has drawn comparisons to the fast food industry [14] and to pollution:

Information consumers end up with less relevant or valuable resources. Producers of relevant resources receive less cash as a reward (lower clickthrough rate) while producers of junk receive more cash. One way to describe this is pollution. Virtual junk pollutes the Web environment by adding noise. Everybody but the polluters pays a price for Web pollution: search engines work less well, users waste precious time and attention on junk sites, and honest publishers lose income. The polluter spoils the Web environment for everybody else.

Markines, Benjamin; Cattuto, Ciro; Menczer, Filippo, "Social Spam Detection" [2]

Not only is the content produced by these systems "low-effort," but these avenues are also used to spread misinformation. For example, conspiracy theories regarding COVID-19 were peddled by content farms, encouraging engagement by feeding into the mass paranoia. The websites promoting these ideas often also shroud the identities of those making editing decisions, making it even more difficult to identify an agenda. [15]

Content farms are also criticised for being the source of fake ad impressions, [16] a form of ad fraud, which takes an unfair share of available advertising spend away from legitimate publishers. [17]

Reception

In one of Google's promotional videos for search published in the summer of 2010, the majority of the links available were reported to be produced at content farms. [18] In late February 2011, Google announced it was adjusting search algorithms significantly to "provide better rankings for high-quality sites—sites with original content and information such as research, in-depth reports, thoughtful analysis and so on." [19] This was reported to be a reaction to content farms and an attempt to reduce their effectiveness in manipulating search result rankings. [20]

Gabriel Weinberg, creator of privacy-focused search engine DuckDuckGo has reported that his search engine makes efforts to block content from content farms. [21]

Research

Since their 2011 appearance on the web, content farms have not yet received much explicit attention from the research community. The model of hiring inexpensive freelancers to produce content of marginal or questionable quality was first discussed as an alternative strategy to generating fake content automatically; this was discussed together with an example of the infrastructure necessary to make content-farm-based sites profitable through online ads, along with techniques to detect social spam that promotes such content. [2]

While not explicitly motivated by content farms, there has been recent interest in the automatic categorisation of websites according to the quality of their content. [22] [23] A detailed study on the application of these methods to the identification of content farm pages is yet to be done.[ citation needed ]

See also

Related Research Articles

<span class="mw-page-title-main">Google Search</span> Search engine from Google

Google Search is a search engine operated by Google. It allows users to search for information on the Internet by entering keywords or phrases. Google Search uses algorithms to analyze and rank websites based on their relevance to the search query. It is the most popular search engine worldwide.

Spamdexing is the deliberate manipulation of search engine indexes. It involves a number of methods, such as link building and repeating unrelated phrases, to manipulate the relevance or prominence of resources indexed in a manner inconsistent with the purpose of the indexing system.

Search engine optimization (SEO) is the process of improving the quality and quantity of website traffic to a website or a web page from search engines. SEO targets unpaid traffic rather than direct traffic or paid traffic. Unpaid traffic may originate from different kinds of searches, including image search, video search, academic search, news search, and industry-specific vertical search engines.

<span class="mw-page-title-main">Link farm</span> Group of websites that link to each other

On the World Wide Web, a link farm is any group of websites that all hyperlink to other sites in the group for the purpose of increasing SEO rankings. In graph theoretic terms, a link farm is a clique. Although some link farms can be created by hand, most are created through automated programs and services. A link farm is a form of spamming the index of a web search engine. Other link exchange systems are designed to allow individual websites to selectively exchange links with other relevant websites, and are not considered a form of spamdexing.

<span class="mw-page-title-main">Google bombing</span> Practice that causes a webpage to have a high rank in Google

The terms Google bombing and Googlewashing refer to the practice of causing a website to rank highly in web search engine results for irrelevant, unrelated or off-topic search terms by linking heavily. In contrast, search engine optimization (SEO) is the practice of improving the search engine listings of web pages for relevant search terms.

<span class="mw-page-title-main">Metasearch engine</span> Online information retrieval tool

A metasearch engine is an online information retrieval tool that uses the data of a web search engine to produce its own results. Metasearch engines take input from a user and immediately query search engines for results. Sufficient data is gathered, ranked, and presented to the users.

<span class="mw-page-title-main">Google Scholar</span> Academic search service by Google

Google Scholar is a freely accessible web search engine that indexes the full text or metadata of scholarly literature across an array of publishing formats and disciplines. Released in beta in November 2004, the Google Scholar index includes peer-reviewed online academic journals and books, conference papers, theses and dissertations, preprints, abstracts, technical reports, and other scholarly literature, including court opinions and patents.

Keyword stuffing is a search engine optimization (SEO) technique, considered webspam or spamdexing, in which keywords are loaded into a web page's meta tags, visible content, or backlink anchor text in an attempt to gain an unfair rank advantage in search engines. Keyword stuffing may lead to a website being temporarily or permanently banned or penalized on major search engines. The repetition of words in meta tags may explain why many search engines no longer use these tags. Nowadays, search engines focus more on the content that is unique, comprehensive, relevant, and helpful that overall makes the quality better which makes keyword stuffing useless, but it is still practiced by many webmasters.

eHow Website

eHow is an online how-to guide with many articles and 170,000 videos offering step-by-step instructions. eHow articles and videos are created by freelancers and cover a wide variety of topics organized into a hierarchy of categories. Any eHow user can leave comments or responses, but only contracted writers can contribute changes to articles. The writers work on a freelance basis, being paid by article. eHow is frequently called a content farm.

A scraper site is a website that copies content from other websites using web scraping. The content is then mirrored with the goal of creating revenue, usually through advertising and sometimes by selling user data.

The Sandbox effect is a theory about the way Google ranks web pages in its index. It is the subject of much debate—its existence has been written about since 2004, but not confirmed, with several statements to the contrary.

Adversarial information retrieval is a topic in information retrieval related to strategies for working with a data source where some portion of it has been manipulated maliciously. Tasks can include gathering, indexing, filtering, retrieving and ranking information from such a data source. Adversarial IR includes the study of methods to detect, isolate, and defeat such manipulation.

In the field of search engine optimization (SEO), link building describes actions aimed at increasing the number and quality of inbound links to a webpage with the goal of increasing the search engine rankings of that page or website. Briefly, link building is the process of establishing relevant hyperlinks to a website from external sites. Link building can increase the number of high-quality links pointing to a website, in turn increasing the likelihood of the website ranking highly in search engine results. Link building is also a proven marketing tactic for increasing brand awareness.

<span class="mw-page-title-main">Leaf Group</span> American online brand company

Leaf Group, formerly Demand Media Inc., is an American content company that operates online brands, including eHow, livestrong.com, and marketplace brands Saatchi Art and Society6. The company provides social media platforms for large company websites and distributes content with social media tools to web outlets. It is commonly known for being a content farm. Demand Media was created in 2006 by a former private equity investor, Shawn Colo, and the former chairman of MySpace, Richard Rosenblatt.

Search neutrality is a principle that search engines should have no editorial policies other than that their results be comprehensive, impartial and based solely on relevance. This means that when a user types in a search engine query, the engine should return the most relevant results found in the provider's domain, without manipulating the order of the results, excluding results, or in any other way manipulating the results to a certain bias.

blekko Web search engine

Blekko, trademarked as blekko (lowercase), was a company that provided a web search engine with the stated goal of providing better search results than those offered by Google Search, with results gathered from a set of 3 billion trusted webpages and excluding such sites as content farms. The company's site, launched to the public on November 1, 2010, used slashtags to provide results for common searches. Blekko also offered a downloadable search bar. It was acquired by IBM in March 2015, and the service was discontinued.

Google's Google Panda is a major change to the company's search results ranking algorithm that was first released in February 2011. The change aimed to lower the rank of "low-quality sites" or "thin sites", in particular "content farms", and return higher-quality sites near the top of the search results.

Social spam is unwanted spam content appearing on social networking services, social bookmarking sites, and any website with user-generated content. It can be manifested in many ways, including bulk messages, profanity, insults, hate speech, malicious links, fraudulent reviews, fake friends, and personally identifiable information.

Google Penguin was a codename for a Google algorithm update that was first announced on April 24, 2012. The update was aimed at decreasing search engine rankings of websites that violate Google's Webmaster Guidelines by using now declared Grey Hat SEM techniques involved in increasing artificially the ranking of a webpage by manipulating the number of links pointing to the page. Such tactics are commonly described as link schemes. According to Google's John Mueller, as of 2013, Google announced all updates to the Penguin filter to the public.

Google Search, offered by Google, is the most widely used search engine on the World Wide Web as of 2023, with over eight billion searches a day. This page covers key events in the history of Google's search service.

References

  1. 1 2 3 Dorian Benkoil (July 26, 2010). "Don't Blame the Content Farms". PBS. Archived from the original on July 28, 2010. Retrieved July 26, 2010.
  2. 1 2 3 Markines, Benjamin; Cattuto, Ciro; Menczer, Filippo (2009), "Social Spam Detection" (PDF), Proceedings of the 5th International Workshop on Adversarial Information Retrieval on the Web (AIRWeb '09), ACM, pp. 41–48, doi:10.1145/1531914.1531924, ISBN   978-1-60558-438-6, S2CID   6078349
  3. Driscoll Miller, Janet (February 1, 2011). "Content Farms: What Are They -- And Why Won't They Just Go Away?". Search Insider. MediaPost. Archived from the original on July 15, 2011. Retrieved February 21, 2014.
  4. Roth, Daniel (October 19, 2009). "The Answer Factory: Demand Media and the Fast, Disposable, and Profitable as Hell Media Model". Wired. Archived from the original on February 23, 2011. Retrieved February 27, 2011.
  5. Plesser, Andy (May 18, 2010). "Yahoo Harvests "Content Farm" Associated Content for $90 Million, Report". Beet.TV. Archived from the original on February 2, 2023.
  6. Rossiter, Jay (July 2, 2014). "Furthering Our Focus". Yahoo. Tumblr. Archived from the original on October 12, 2014. Retrieved October 7, 2014.
  7. "What It's Like To Write For Demand Media: Low Pay But Lots of Freedom". ReadWriteWeb. December 17, 2009. p. 2. Archived from the original on February 19, 2011. Retrieved November 4, 2010.
  8. Thompson, Stuart A. (May 19, 2023). "A.I.-Generated Content Discovered on News Sites, Content Farms and Product Reviews". The New York Times. ISSN   0362-4331 . Retrieved February 8, 2024.
  9. Dupre, Maggie Harrison. “People Are Spinning Up Low-Effort Content Farms Using AI.” Futurism, Recurrent Ventures Inc, 2 July 2023, futurism.com/content-farms-ai. Retrieved February 28, 2024.
  10. Patricio Robles (April 9, 2010). "USA Today turns to the content farm as the ship sinks". Econsultancy. Archived from the original on April 13, 2010. Retrieved July 26, 2010.
  11. Reinan, John (July 19, 2010). "I'm still waiting to make a bushel from my 'content farm' work". MinnPost. Archived from the original on July 27, 2010. Retrieved July 26, 2010.
  12. Hiar, Corbin (July 21, 2010). "Writers Explain What It's Like Toiling on the Content Farm". MediaShift. PBS. Archived from the original on March 30, 2017.
  13. MacManus, Richard (December 15, 2009). "How Google Can Combat Content Farms". ReadWriteWeb. Archived from the original on July 28, 2010.
  14. Michael Arrington: The End Of Hand Crafted Content . In: TechCrunch vom 13. Dezember 2009.
  15. Marr, Bernard. “The Danger of Ai Content Farms.” Forbes, Forbes Magazine, 5 Oct. 2023, www.forbes.com/sites/bernardmarr/2023/05/16/the-danger-of-ai-content-farms/?sh=82f8e3b4fcab. Retrieved February 28, 2024.
  16. Buzz, Carles (September 25, 2015). "How to Build a Content Farm in 20 Minutes". Vice. Retrieved February 8, 2024.
  17. Radsch, Courtney C. (2023). Content Farms and the Limitations of Copyright for Independent Media (Report). Centre for International Governance Innovation. pp. 16–17.
  18. Wauters, Robin (July 23, 2010). "Google's New Video Ad Highlights How Content Farms Rule At The Search Game". TechCrunch. Archived from the original on April 13, 2021.
  19. Singhal, Amit; Cutts, Matt. "Finding more high-quality sites in search". Official Google Blog. Blogspot. Archived from the original on February 26, 2011. Retrieved February 26, 2011.
  20. Guynn, Jessica (February 26, 2011). "Google makes major change in search ranking algorithms" . Los Angeles Times. Archived from the original on February 27, 2011. Retrieved February 26, 2011.
  21. "The Search Engine Backlash Against 'Content Mills'". MIT Technology Review. Retrieved February 28, 2023.
  22. "Discovery Challenge 2010". ECMLP KDD 2010. 2010. Archived from the original on April 9, 2011. Retrieved April 22, 2011.
  23. "Joint WICOW/AIRWeb Workshop on Web Quality". dl.kuis.kyoto-u.ac.jp. 2011. Archived from the original on February 14, 2020.