Scunthorpe problem

Last updated

An example of the Scunthorpe problem in Wikipedia because of a regular expression identifying "cunt" in the username Scunthorpe problem (cropped).png
An example of the Scunthorpe problem in Wikipedia because of a regular expression identifying "cunt" in the username

The Scunthorpe problem is the unintentional blocking of online content by a spam filter or search engine because their text contains a string (or substring) of letters that appear to have an obscene or otherwise unacceptable meaning. Names, abbreviations, and technical terms are most often cited as being affected by the issue.

Contents

The problem arises since computers can easily identify strings of text within a document, but interpreting words of this kind requires considerable ability to interpret a wide range of contexts, possibly across many cultures, which is an extremely difficult task. As a result, broad blocking rules may result in false positives affecting many innocent phrases.

Etymology and origin

The problem was named after an incident in 1996 in which AOL's profanity filter prevented residents of the town of Scunthorpe, North Lincolnshire, England, from creating accounts with AOL, because the town's name contains the substring "cunt". [1] In the early 2000s, Google's opt-in SafeSearch filters made the same error, with local services and businesses that included Scunthorpe in their names or URLs among those mistakenly excluded from appearing in search results. [2]

Workarounds

The Scunthorpe problem is challenging to completely solve due to the difficulty of creating a filter capable of understanding words in context. [3] [4]

One solution involves creating a whitelist of known false positives. Any word appearing on the whitelist can be ignored by the filter, even though it contains text that would otherwise not be allowed. [5]

Other examples

Mistaken decisions by obscenity filters include:

Refused web domain names and account registrations

Blocked web searches

Blocked emails

Blocked for words with multiple meanings

News articles

Other

See also

Related Research Articles

An Internet filter is software that restricts or controls the content an Internet user is capable to access, especially when utilized to restrict material delivered over the Internet via the Web, Email, or other means. Content-control software determines what content will be available or be blocked.

<span class="mw-page-title-main">Email</span> Mail sent using electronic means

Electronic mail is a method of transmitting and receiving messages using electronic devices. It was conceived in the late–20th century as the digital version of, or counterpart to, mail. Email is a ubiquitous and very widely used communication medium; in current use, an email address is often treated as a basic and necessary part of many processes in business, commerce, government, education, entertainment, and other spheres of daily life in most countries.

<span class="mw-page-title-main">Spamming</span> Unsolicited electronic messages, especially advertisements

Spamming is the use of messaging systems to send multiple unsolicited messages (spam) to large numbers of recipients for the purpose of commercial advertising, for the purpose of non-commercial proselytizing, for any prohibited purpose, or simply repeatedly sending the same message to the same user. While the most widely recognized form of spam is email spam, the term is applied to similar abuses in other media: instant messaging spam, Usenet newsgroup spam, Web search engine spam, spam in blogs, wiki spam, online classified ads spam, mobile phone messaging spam, Internet forum spam, junk fax transmissions, social spam, spam mobile apps, television advertising and file sharing spam. It is named after Spam, a luncheon meat, by way of a Monty Python sketch about a restaurant that has Spam in almost every dish in which Vikings annoyingly sing "Spam" repeatedly.

A Domain Name System blocklist, Domain Name System-based blackhole list, Domain Name System blacklist (DNSBL) or real-time blackhole list (RBL) is a service for operation of mail servers to perform a check via a Domain Name System (DNS) query whether a sending host's IP address is blacklisted for email spam. Most mail server software can be configured to check such lists, typically rejecting or flagging messages from such sites.

<span class="mw-page-title-main">Microsoft Outlook</span> Email and calendaring software

Microsoft Outlook is a personal information manager software system from Microsoft, available as a part of the Microsoft 365 software suites. Though primarily being popular as an email client for businesses, Outlook also includes functions such as calendaring, task managing, contact managing, note-taking, journal logging, web browsing, and RSS news aggregation.

<span class="mw-page-title-main">Jonathan Zittrain</span> American law professor (born 1969)

Jonathan L. Zittrain is an American professor of Internet law and the George Bemis Professor of International Law at Harvard Law School. He is also a professor at the Harvard Kennedy School, a professor of computer science at the Harvard School of Engineering and Applied Sciences, and co-founder and director of the Berkman Klein Center for Internet & Society. Previously, Zittrain was Professor of Internet Governance and Regulation at the Oxford Internet Institute of the University of Oxford and visiting professor at the New York University School of Law and Stanford Law School. He is the author of The Future of the Internet and How to Stop It as well as co-editor of the books, Access Denied, Access Controlled, and Access Contested.

<span class="mw-page-title-main">Internet forum</span> Online discussion site

An Internet forum, or message board, is an online discussion site where people can hold conversations in the form of posted messages. They differ from chat rooms in that messages are often longer than one line of text, and are at least temporarily archived. Also, depending on the access level of a user or the forum set-up, a posted message might need to be approved by a moderator before it becomes publicly visible.

Various anti-spam techniques are used to prevent email spam.

<span class="mw-page-title-main">Email spam</span> Unsolicited electronic advertising by email

Email spam, also referred to as junk email, spam mail, or simply spam, is unsolicited messages sent in bulk by email (spamming). The name comes from a Monty Python sketch in which the name of the canned pork product Spam is ubiquitous, unavoidable, and repetitive. Email spam has steadily grown since the early 1990s, and by 2014 was estimated to account for around 90% of total email traffic.

Naive Bayes classifiers are a popular statistical technique of e-mail filtering. They typically use bag-of-words features to identify email spam, an approach commonly used in text classification.

A Joe job is a spamming technique that sends out unsolicited e-mails using spoofed sender data. Early Joe jobs aimed at tarnishing the reputation of the apparent sender or inducing the recipients to take action against them, but they are now typically used by commercial spammers to conceal the true origin of their messages and to trick recipients into opening emails apparently coming from a trusted source.

<span class="mw-page-title-main">Yahoo! Mail</span> American email service

Yahoo! Mail is an email service offered by the American company Yahoo, Inc. The service is free for personal use, with an optional monthly fee for additional features. Business email was previously available with the Yahoo! Small Business brand, before it transitioned to Verizon Small Business Essentials in early 2022. Launched on October 8, 1997, as of January 2020, Yahoo! Mail has 225 million users.

<span class="mw-page-title-main">The Spamhaus Project</span> Organization targetting email spammers

The Spamhaus Project is an international organisation based in the Principality of Andorra, founded in 1998 by Steve Linford to track email spammers and spam-related activity. The name spamhaus, a pseudo-German expression, was coined by Linford to refer to an internet service provider, or other firm, which spams or knowingly provides service to spammers.

Online advertising, also known as online marketing, Internet advertising, digital advertising or web advertising, is a form of marketing and advertising that uses the Internet to promote products and services to audiences and platform users. Online advertising includes email marketing, search engine marketing (SEM), social media marketing, many types of display advertising, and mobile advertising. Advertisements are increasingly being delivered via automated software systems operating across multiple websites, media services and platforms, known as programmatic advertising.

A wordfilter is a script typically used on Internet forums or chat rooms that automatically scans users' posts or comments as they are submitted and automatically changes or censors particular words or phrases.

Bayesian poisoning is a technique used by e-mail spammers to attempt to degrade the effectiveness of spam filters that rely on Bayesian spam filtering. Bayesian filtering relies on Bayesian probability to determine whether an incoming mail is spam or is not spam. The spammer hopes that the addition of random words that are unlikely to appear in a spam message will cause the spam filter to believe the message to be legitimate—a statistical type II error.

Scieno Sitter is content-control software that, when installed on a computer, blocks certain websites critical of Scientology from being viewed. The software was released by the Church of Scientology in 1998 for Church members using Windows 95. The term "Scieno Sitter" was coined by critics of Scientology who assert that the program is a form of Internet censorship.

The history of email spam reaches back to the mid-1990s when commercial use of the internet first became possible - and marketers and publicists began to test what was possible.

Email spammers have developed a variety of ways to deliver email spam throughout the years, such as mass-creating accounts on services such as Hotmail or using another person's network to send email spam. Many techniques to block, filter, or otherwise remove email spam from inboxes have been developed by internet users, system administrators and internet service providers. Due to this, email spammers have developed their own techniques to send email spam, which are listed below.

A cold email is an unsolicited e-mail that is sent to a receiver without prior contact. It could also be defined as the email equivalent of cold calling. Cold emailing is a subset of email marketing and differs from transactional and warm emailing.

References

  1. Clive Feather (25 April 1996). Peter G. Neumann (ed.). "AOL censors British town's name!". The Risks Digest. 18 (7).
  2. 1 2 McCullagh, Declan (23 April 2004). "Google's chastity belt too tight". CNET. Archived from the original on 16 June 2011.
  3. Oberhaus, Daniel (29 August 2018). "Life on the Internet Is Hard When Your Last Name is 'Butts'". Vice . Retrieved 31 July 2022.
  4. Gellis, Cathy (31 August 2018). "The Scunthorpe Problem, And Why AI Is Not A Silver Bullet For Moderating Platform Content At Scale". Techdirt . Retrieved 31 July 2022.
  5. Veale, Tony (2021). Your Wit Is My Command: Building AIs with a Sense of Humor. MIT Press. p. 231. ISBN   978-0-262-04599-5. OCLC   1221016857.
  6. Festa, Paul (27 April 1998). "Food domain found "obscene"". News.com. Archived from the original on 10 May 2020.
  7. "Foire aux questions". radio-canada.ca. Archived from the original on 21 October 2012. Retrieved 24 February 2011.
  8. Barker, Garry (26 February 2004). "How Mr C0ckburn fought spam". The Sydney Morning Herald . Archived from the original on 3 September 2009.
  9. Cockburn, Craig (9 March 2010). "BBC fail – my correct name is not permitted". blog.siliconglen.com. Archived from the original on 30 September 2020.
  10. "Is Yahoo Banning Allah?". Kallahar's Place. Archived from the original on 14 January 2016. Retrieved 24 February 2011.
  11. Rubin, Daniel. "When your name gets turned against you". The Philadelphia Inquirer . Archived from the original on 5 August 2008. Retrieved 3 August 2008.
  12. "E-Rate And Filtering: A Review Of The Children's Internet Protection Act". Congressional Hearings. General. Energy and Commerce, Subcommittee on Telecommunications and the Internet. 4 April 2001.
  13. "F-Word Town's Name Gets Censored By Internet Filter". Archived from the original on 1 December 2008. Retrieved 27 July 2011.{{cite news}}: CS1 maint: bot: original URL status unknown (link)
  14. Chin, Josh (6 July 2011). "Following Jiang Death Rumors, China's Rivers Go Missing" . The Wall Street Journal . Archived from the original on 13 August 2011.
  15. Molloy, Mark (27 February 2018). "Wine lovers cannot buy Burgundy tipple on Google as internet giant cracks down on 'gun' searches". The Telegraph . Archived from the original on 2 March 2018. Retrieved 27 February 2018.
  16. "Yahoo admits mangling e-mail". BBC News. 19 July 2002. Archived from the original on 26 January 2021. Retrieved 21 June 2013.
  17. "Hard news". Need To Know 2002-07-12. 12 July 2002. Retrieved 21 June 2013.
  18. Knight, Will (15 July 2002). "Email security filter spawns new words". New Scientist . Archived from the original on 24 September 2020. Retrieved 21 June 2013.
  19. "E-mail vetting blocks MPs' sex debate". BBC News. 4 February 2003. Archived from the original on 4 February 2021.
  20. "Software blocks MPs' Welsh e-mail". BBC News. 5 February 2003. Archived from the original on 4 February 2021.
  21. Kwintner, Adrian (5 October 2004). "Name of museum is confused with porn". News Shopper.
  22. Jones, Sam (13 October 2004). "Panto email falls foul of filth filter". The Guardian . Archived from the original on 4 February 2021.
  23. "E-mail filter blocks 'erection'". 30 May 2006. Archived from the original on 4 February 2021.
  24. "The Beaver mag renamed to end porn mix-up". The Sydney Morning Herald . Agence France-Presse. 13 January 2010. Archived from the original on 9 November 2020. Retrieved 24 February 2021.
  25. Austen, Ian (24 January 2010). "Web Filters Cause Name Change for a Magazine". The New York Times . Archived from the original on 9 November 2020. Retrieved 24 February 2021.
  26. Sheerin, Jude (29 March 2010). "How spam filters dictated Canadian magazine's fate". BBC News. Archived from the original on 16 January 2021.
  27. "Luxemburger Twitter-Neubenutzer nach 29 Minuten blockiert" [Luxembourg new Twitter user blocked after 29 minutes]. Tageblatt (in German). 22 June 2010. Retrieved 12 June 2010.[ dead link ]
  28. "Black Country Councillor Caught up in Faggots Farce". Birmingham Mail. 24 February 2011.
  29. Tom Chatfield (17 April 2013). "The 10 best words the internet has given English". The Guardian.
  30. Keyes, Ralph (2010). Unmentionables: From Family Jewels to Friendly Fire – What We Say Instead of What We Mean. John Murray. ISBN   978-1-84854-456-7.
  31. Maher, Kris. "Don't Let Spam Filters Snatch Your Resume". Career Journal. Archived from the original on 23 October 2006. Retrieved 11 February 2008.
  32. Frauenfelder, Mark (30 June 2008). "Homophobic news site changes athlete Tyson Gay to Tyson Homosexual". Boing Boing . Archived from the original on 4 February 2021.
  33. Arthur, Charles (30 June 2008). "Computer autocorrects surname 'gay' to.. no, you guess". The Guardian . Archived from the original on 13 November 2020.
  34. Mantyla, Kyle (30 June 2008). "The Dangers of Auto-Replace". Right Wing Watch . People for the American Way. Archived from the original on 25 October 2020. Retrieved 24 February 2021.
  35. Moore, Matthew (2 September 2008). "The Clbuttic Mistake: When obscenity filters go wrong". The Telegraph . Archived from the original on 23 February 2020.
  36. "Microsoft Confirms "Gaywood" Is An Offensive Surname, Mr. Gaywood Responds". May 2008. Archived from the original on 9 November 2012.
  37. Keating, Lauren (17 February 2016). "These Are The Words Nintendo Censors From Appearing On The 3DS". Tech Times. Retrieved 14 November 2023.
  38. Mozur, Paul; Tejada, Carlos (13 February 2013). "China's 'Wall' Hits Business". The Wall Street Journal. Archived from the original on 10 September 2013. Retrieved 25 May 2013.
  39. "Faggots and peas fall foul of Facebook censors". Express & Star . November 2013. Archived from the original on 10 May 2020.
  40. Gibbs, Samuel (21 January 2014). "UK porn filter blocks game update that contained 'sex'". The Guardian . London. Archived from the original on 11 November 2020.
  41. Ferguson, Amber (22 May 2018). "Proud mom orders 'Summa Cum Laude' cake online. Publix censors it: Summa … Laude". The Washington Post . Archived from the original on 22 May 2018. Retrieved 22 May 2018.{{cite news}}: CS1 maint: bot: original URL status unknown (link)
  42. Amatulli, Jenna (22 May 2018). "Publix Censors Teen's 'Summa Cum Laude' Graduation Cake". The Huffington Post . Archived from the original on 5 September 2018.
  43. Hern, Alex (27 May 2020). "Anti-porn filters stop Dominic Cummings trending on Twitter". The Guardian . Archived from the original on 20 February 2021.
  44. Ferreira, Becky (15 October 2020). "A Profanity Filter Banned the Word 'bone' at a Paleontology Conference". Motherboard . Archived from the original on 23 February 2021.
  45. Morris, Steven (27 January 2021). "Facebook apologises for flagging Plymouth Hoe as offensive term". The Guardian . Archived from the original on 29 January 2021.
  46. Kempf, Cédric (12 April 2021). "Insolite : Bitche est censuré par Facebook". Radio Mélodie (in French).
  47. Darmanin, Jules (13 April 2021). "Facebook takes down official page for French town of Bitche". POLITICO. Retrieved 3 July 2021.