Scunthorpe problem

Last updated

An example of the Scunthorpe problem in Wikipedia because of a regular expression identifying "cunt" in the username Scunthorpe problem (cropped).png
An example of the Scunthorpe problem in Wikipedia because of a regular expression identifying "cunt" in the username

The Scunthorpe problem is the unintentional blocking of online content by a spam filter or search engine because their text contains a string (or substring) of letters that appear to have an obscene or otherwise unacceptable meaning. Names, abbreviations, and technical terms are most often cited as being affected by the issue.

Contents

The problem arises since computers can easily identify strings of text within a document, but interpreting words of this kind requires considerable ability to interpret a wide range of contexts, possibly across many cultures, which is an extremely difficult task. As a result, broad blocking rules may result in false positives affecting many innocent phrases.

Etymology and origin

The problem was named after an incident in April 1996 in which AOL's profanity filter prevented people in the English town of Scunthorpe from creating AOL accounts because the town's name contains the substring "cunt". [1] In the early 2000s, Google's opt-in SafeSearch made the same error, with local services and businesses that included the town in their names or URLs among those mistakenly hidden from search results. [2]

Workarounds

The Scunthorpe problem is challenging to completely solve due to the difficulty of creating a filter capable of understanding words in context. [3] [4]

One solution involves creating a whitelist of known false positives. Any word appearing on the whitelist can be ignored by the filter, even though it contains text that would otherwise not be allowed. [5]

Other examples

Mistaken decisions by obscenity filters include:

Refused web domain names and account registrations

Blocked web searches

Blocked emails

Blocked for words with multiple meanings

News articles

Video games

Other

See also

Related Research Articles

An Internet filter is software that restricts or controls the content an Internet user is capable to access, especially when utilized to restrict material delivered over the Internet via the Web, Email, or other means. Such restrictions can be applied at various levels: a government can attempt to apply them nationwide, or they can, for example, be applied by an Internet service provider to its clients, by an employer to its personnel, by a school to its students, by a library to its visitors, by a parent to a child's computer, or by an individual user to their own computers. The motive is often to prevent access to content which the computer's owner(s) or other authorities may consider objectionable. When imposed without the consent of the user, content control can be characterised as a form of internet censorship. Some filter software includes time control functions that empowers parents to set the amount of time that child may spend accessing the Internet or playing games or other computer activities.

<span class="mw-page-title-main">Email</span> Mail sent using electronic means

Electronic mail is a method of transmitting and receiving digital messages using electronic devices over a computer network. It was conceived in the late–20th century as the digital version of, or counterpart to, mail. Email is a ubiquitous and very widely used communication medium; in current use, an email address is often treated as a basic and necessary part of many processes in business, commerce, government, education, entertainment, and other spheres of daily life in most countries.

A Domain Name System blocklist, Domain Name System-based blackhole list, Domain Name System blacklist (DNSBL) or real-time blackhole list (RBL) is a service for operation of mail servers to perform a check via a Domain Name System (DNS) query whether a sending host's IP address is blacklisted for email spam. Most mail server software can be configured to check such lists, typically rejecting or flagging messages from such sites.

A mailing list is a collection of names and addresses used by an individual or an organization to send material to multiple recipients. The term is often extended to include the people subscribed to such a list, so the group of subscribers is referred to as "the mailing list", or simply "the list".

<span class="mw-page-title-main">Microsoft Outlook</span> Email and calendaring software

Microsoft Outlook is a personal information manager software system from Microsoft, available as a part of the Microsoft 365 software suites. Primarily popular as an email client for businesses, Outlook also includes functions such as calendaring, task managing, contact managing, note-taking, journal logging, web browsing, and RSS news aggregation.

<span class="mw-page-title-main">Jonathan Zittrain</span> American law professor (born 1969)

Jonathan L. Zittrain is an American professor of Internet law and the George Bemis Professor of International Law at Harvard Law School. He is also a professor at the Harvard Kennedy School, a professor of computer science at the Harvard School of Engineering and Applied Sciences, and co-founder and director of the Berkman Klein Center for Internet & Society. Previously, Zittrain was Professor of Internet Governance and Regulation at the Oxford Internet Institute of the University of Oxford and visiting professor at the New York University School of Law and Stanford Law School. He is the author of The Future of the Internet and How to Stop It as well as co-editor of the books, Access Denied, Access Controlled, and Access Contested.

<span class="mw-page-title-main">Internet forum</span> Online discussion site

An Internet forum, or message board, is an online discussion site where people can hold conversations in the form of posted messages. They differ from chat rooms in that messages are often longer than one line of text, and are at least temporarily archived. Also, depending on the access level of a user or the forum set-up, a posted message might need to be approved by a moderator before it becomes publicly visible.

Various anti-spam techniques are used to prevent email spam.

<span class="mw-page-title-main">Email spam</span> Unsolicited electronic advertising by email

Email spam, also referred to as junk email, spam mail, or simply spam, is unsolicited messages sent in bulk by email (spamming). The name comes from a Monty Python sketch in which the name of the canned pork product Spam is ubiquitous, unavoidable, and repetitive. Email spam has steadily grown since the early 1990s, and by 2014 was estimated to account for around 90% of total email traffic.

Naive Bayes classifiers are a popular statistical technique of e-mail filtering. They typically use bag-of-words features to identify email spam, an approach commonly used in text classification.

<span class="mw-page-title-main">Yahoo Mail</span> American email service

Yahoo! Mail is an email service offered by the American company Yahoo, Inc. The service is free for personal use, with an optional monthly fee for additional features. Business email was previously available with the Yahoo! Small Business brand, before it transitioned to Verizon Small Business Essentials in early 2022. Launched on October 8, 1997, as of January 2020, Yahoo! Mail has 225 million users.

<span class="mw-page-title-main">The Spamhaus Project</span> Organization targetting email spammers

The Spamhaus Project is an international organisation based in the Principality of Andorra, founded in 1998 by Steve Linford to track email spammers and spam-related activity. The name spamhaus, a pseudo-German expression, was coined by Linford to refer to an internet service provider, or other firm, which spams or knowingly provides service to spammers.

Online advertising, also known as online marketing, Internet advertising, digital advertising or web advertising, is a form of marketing and advertising that uses the Internet to promote products and services to audiences and platform users. Online advertising includes email marketing, search engine marketing (SEM), social media marketing, many types of display advertising, and mobile advertising. Advertisements are increasingly being delivered via automated software systems operating across multiple websites, media services and platforms, known as programmatic advertising.

SORBS was a list of e-mail servers suspected of sending or relaying spam. It had been augmented with complementary lists that include various other classes of hosts, allowing for customized email rejection by its users.

A wordfilter is a script typically used on Internet forums or chat rooms that automatically scans users' posts or comments as they are submitted and automatically changes or censors particular words or phrases.

Bayesian poisoning is a technique used by e-mail spammers to attempt to degrade the effectiveness of spam filters that rely on Bayesian spam filtering. Bayesian filtering relies on Bayesian probability to determine whether an incoming mail is spam or is not spam. The spammer hopes that the addition of random words that are unlikely to appear in a spam message will cause the spam filter to believe the message to be legitimate—a statistical type II error.

Backscatter is incorrectly automated bounce messages sent by mail servers, typically as a side effect of incoming spam.

The history of email spam reaches back to the mid-1990s when commercial use of the internet first became possible - and marketers and publicists began to test what was possible.

Email spammers have developed a variety of ways to deliver email spam throughout the years, such as mass-creating accounts on services such as Hotmail or using another person's network to send email spam. Many techniques to block, filter, or otherwise remove email spam from inboxes have been developed by internet users, system administrators and internet service providers. Due to this, email spammers have developed their own techniques to send email spam, which are listed below.

Social spam is unwanted spam content appearing on social networking services, social bookmarking sites, and any website with user-generated content. It can be manifested in many ways, including bulk messages, profanity, insults, hate speech, malicious links, fraudulent reviews, fake friends, and personally identifiable information.

References

  1. Clive Feather (25 April 1996). Peter G. Neumann (ed.). "AOL censors British town's name!". The Risks Digest. 18 (7).
  2. 1 2 McCullagh, Declan (23 April 2004). "Google's chastity belt too tight". CNET. Archived from the original on 16 June 2011.
  3. Oberhaus, Daniel (29 August 2018). "Life on the Internet Is Hard When Your Last Name is 'Butts'". Vice . Retrieved 31 July 2022.
  4. Gellis, Cathy (31 August 2018). "The Scunthorpe Problem, And Why AI Is Not A Silver Bullet For Moderating Platform Content At Scale". Techdirt . Retrieved 31 July 2022.
  5. Veale, Tony (2021). Your Wit Is My Command: Building AIs with a Sense of Humor. MIT Press. p. 231. ISBN   978-0-262-04599-5. OCLC   1221016857.
  6. Festa, Paul (27 April 1998). "Food domain found "obscene"". News.com. Archived from the original on 10 May 2020.
  7. "Foire aux questions". radio-canada.ca. Archived from the original on 21 October 2012. Retrieved 24 February 2011.
  8. Barker, Garry (26 February 2004). "How Mr C0ckburn fought spam". The Sydney Morning Herald . Archived from the original on 3 September 2009.
  9. Cockburn, Craig (9 March 2010). "BBC fail – my correct name is not permitted". blog.siliconglen.com. Archived from the original on 30 September 2020.
  10. "Is Yahoo Banning Allah?". Kallahar's Place. Archived from the original on 14 January 2016. Retrieved 24 February 2011.
  11. Rubin, Daniel. "When your name gets turned against you". The Philadelphia Inquirer . Archived from the original on 5 August 2008. Retrieved 3 August 2008.
  12. "E-Rate And Filtering: A Review Of The Children's Internet Protection Act". Congressional Hearings. General. Energy and Commerce, Subcommittee on Telecommunications and the Internet. 4 April 2001.
  13. "F-Word Town's Name Gets Censored By Internet Filter". Archived from the original on 1 December 2008. Retrieved 27 July 2011.{{cite news}}: CS1 maint: bot: original URL status unknown (link)
  14. Chin, Josh (6 July 2011). "Following Jiang Death Rumors, China's Rivers Go Missing" . The Wall Street Journal . Archived from the original on 13 August 2011.
  15. Molloy, Mark (27 February 2018). "Wine lovers cannot buy Burgundy tipple on Google as internet giant cracks down on 'gun' searches". The Telegraph . Archived from the original on 2 March 2018. Retrieved 27 February 2018.
  16. "Yahoo admits mangling e-mail". BBC News. 19 July 2002. Archived from the original on 26 January 2021. Retrieved 21 June 2013.
  17. "Hard news". Need To Know 2002-07-12. 12 July 2002. Retrieved 21 June 2013.
  18. Knight, Will (15 July 2002). "Email security filter spawns new words". New Scientist . Archived from the original on 24 September 2020. Retrieved 21 June 2013.
  19. "E-mail vetting blocks MPs' sex debate". BBC News. 4 February 2003. Archived from the original on 4 February 2021.
  20. "Software blocks MPs' Welsh e-mail". BBC News. 5 February 2003. Archived from the original on 4 February 2021.
  21. Kwintner, Adrian (5 October 2004). "Name of museum is confused with porn". News Shopper.
  22. Jones, Sam (13 October 2004). "Panto email falls foul of filth filter". The Guardian . Archived from the original on 4 February 2021.
  23. "E-mail filter blocks 'erection'". 30 May 2006. Archived from the original on 4 February 2021.
  24. "The Beaver mag renamed to end porn mix-up". The Sydney Morning Herald . Agence France-Presse. 13 January 2010. Archived from the original on 9 November 2020. Retrieved 24 February 2021.
  25. Austen, Ian (24 January 2010). "Web Filters Cause Name Change for a Magazine". The New York Times . Archived from the original on 9 November 2020. Retrieved 24 February 2021.
  26. Sheerin, Jude (29 March 2010). "How spam filters dictated Canadian magazine's fate". BBC News. Archived from the original on 16 January 2021.
  27. "Luxemburger Twitter-Neubenutzer nach 29 Minuten blockiert" [Luxembourg new Twitter user blocked after 29 minutes]. Tageblatt (in German). 22 June 2010. Retrieved 12 June 2010.[ dead link ]
  28. "Black Country Councillor Caught up in Faggots Farce". Birmingham Mail. 24 February 2011.
  29. Tom Chatfield (17 April 2013). "The 10 best words the internet has given English". The Guardian.
  30. Keyes, Ralph (2010). Unmentionables: From Family Jewels to Friendly Fire – What We Say Instead of What We Mean. John Murray. ISBN   978-1-84854-456-7.
  31. Maher, Kris. "Don't Let Spam Filters Snatch Your Resume". Career Journal. Archived from the original on 23 October 2006. Retrieved 11 February 2008.
  32. Frauenfelder, Mark (30 June 2008). "Homophobic news site changes athlete Tyson Gay to Tyson Homosexual". Boing Boing . Archived from the original on 4 February 2021.
  33. Arthur, Charles (30 June 2008). "Computer autocorrects surname 'gay' to.. no, you guess". The Guardian . Archived from the original on 13 November 2020.
  34. Mantyla, Kyle (30 June 2008). "The Dangers of Auto-Replace". Right Wing Watch . People for the American Way. Archived from the original on 25 October 2020. Retrieved 24 February 2021.
  35. Williams, Joe (6 August 2015). "US newspaper claims Hiroshima bombing caused by 'homosexual' plane". PinkNews . Retrieved 14 January 2025.
  36. "Hiroshima Atomic Bombing 70th Anniversary Marked with Solemn Ceremony, Calls for Nuclear Disarmament". Observer Chronicle. 6 August 2015. Archived from the original on 11 August 2015.
  37. Moore, Matthew (2 September 2008). "The Clbuttic Mistake: When obscenity filters go wrong". The Telegraph . Archived from the original on 23 February 2020.
  38. Dengsø, Christopher (19 July 2023). "The Clbuttic Mistake: A Thing Of The Past?". Moderation API. Retrieved 25 November 2024.
  39. "Clbuttic mistake". Collins Dictionary . Retrieved 25 November 2024.
  40. "Microsoft Confirms "Gaywood" Is An Offensive Surname, Mr. Gaywood Responds". May 2008. Archived from the original on 9 November 2012.
  41. Keating, Lauren (17 February 2016). "These Are The Words Nintendo Censors From Appearing On The 3DS". Tech Times. Retrieved 14 November 2023.
  42. Gibbs, Samuel (21 January 2014). "UK porn filter blocks game update that contained 'sex'". The Guardian . London. Archived from the original on 11 November 2020.
  43. Mozur, Paul; Tejada, Carlos (13 February 2013). "China's 'Wall' Hits Business". The Wall Street Journal. Archived from the original on 10 September 2013. Retrieved 25 May 2013.
  44. "Faggots and peas fall foul of Facebook censors". Express & Star . November 2013. Archived from the original on 10 May 2020.
  45. Ferguson, Amber (22 May 2018). "Proud mom orders 'Summa Cum Laude' cake online. Publix censors it: Summa … Laude". The Washington Post . Archived from the original on 22 May 2018. Retrieved 22 May 2018.{{cite news}}: CS1 maint: bot: original URL status unknown (link)
  46. Amatulli, Jenna (22 May 2018). "Publix Censors Teen's 'Summa Cum Laude' Graduation Cake". The Huffington Post . Archived from the original on 5 September 2018.
  47. Hern, Alex (27 May 2020). "Anti-porn filters stop Dominic Cummings trending on Twitter". The Guardian . Archived from the original on 20 February 2021.
  48. Ferreira, Becky (15 October 2020). "A Profanity Filter Banned the Word 'bone' at a Paleontology Conference". Motherboard . Archived from the original on 23 February 2021.
  49. Morris, Steven (27 January 2021). "Facebook apologises for flagging Plymouth Hoe as offensive term". The Guardian . Archived from the original on 29 January 2021.
  50. Kempf, Cédric (12 April 2021). "Insolite : Bitche est censuré par Facebook". Radio Mélodie (in French).
  51. Darmanin, Jules (13 April 2021). "Facebook takes down official page for French town of Bitche". POLITICO. Retrieved 3 July 2021.