The Scunthorpe problem is the unintentional blocking of online content by a spam filter or search engine because their text contains a string (or substring) of letters that appear to have an obscene or otherwise unacceptable meaning. Names, abbreviations, and technical terms are most often cited as being affected by the issue.
The problem arises since computers can easily identify strings of text within a document, but interpreting words of this kind requires considerable ability to interpret a wide range of contexts, possibly across many cultures, which is an extremely difficult task. As a result, broad blocking rules may result in false positives affecting many innocent phrases.
The problem was named after an incident in April 1996 in which AOL's profanity filter prevented people in the English town of Scunthorpe from creating AOL accounts because the town's name contains the substring "cunt". [1] In the early 2000s, Google's opt-in SafeSearch made the same error, with local services and businesses that included the town in their names or URLs among those mistakenly hidden from search results. [2]
The Scunthorpe problem is challenging to completely solve due to the difficulty of creating a filter capable of understanding words in context. [3] [4]
One solution involves creating a whitelist of known false positives. Any word appearing on the whitelist can be ignored by the filter, even though it contains text that would otherwise not be allowed. [5]
Mistaken decisions by obscenity filters include:
An Internet filter is software that restricts or controls the content an Internet user is capable to access, especially when utilized to restrict material delivered over the Internet via the Web, Email, or other means. Such restrictions can be applied at various levels: a government can attempt to apply them nationwide, or they can, for example, be applied by an Internet service provider to its clients, by an employer to its personnel, by a school to its students, by a library to its visitors, by a parent to a child's computer, or by an individual user to their own computers. The motive is often to prevent access to content which the computer's owner(s) or other authorities may consider objectionable. When imposed without the consent of the user, content control can be characterised as a form of internet censorship. Some filter software includes time control functions that empowers parents to set the amount of time that child may spend accessing the Internet or playing games or other computer activities.
Electronic mail is a method of transmitting and receiving digital messages using electronic devices over a computer network. It was conceived in the late–20th century as the digital version of, or counterpart to, mail. Email is a ubiquitous and very widely used communication medium; in current use, an email address is often treated as a basic and necessary part of many processes in business, commerce, government, education, entertainment, and other spheres of daily life in most countries.
A Domain Name System blocklist, Domain Name System-based blackhole list, Domain Name System blacklist (DNSBL) or real-time blackhole list (RBL) is a service for operation of mail servers to perform a check via a Domain Name System (DNS) query whether a sending host's IP address is blacklisted for email spam. Most mail server software can be configured to check such lists, typically rejecting or flagging messages from such sites.
A mailing list is a collection of names and addresses used by an individual or an organization to send material to multiple recipients. The term is often extended to include the people subscribed to such a list, so the group of subscribers is referred to as "the mailing list", or simply "the list".
Microsoft Outlook is a personal information manager software system from Microsoft, available as a part of the Microsoft 365 software suites. Primarily popular as an email client for businesses, Outlook also includes functions such as calendaring, task managing, contact managing, note-taking, journal logging, web browsing, and RSS news aggregation.
Jonathan L. Zittrain is an American professor of Internet law and the George Bemis Professor of International Law at Harvard Law School. He is also a professor at the Harvard Kennedy School, a professor of computer science at the Harvard School of Engineering and Applied Sciences, and co-founder and director of the Berkman Klein Center for Internet & Society. Previously, Zittrain was Professor of Internet Governance and Regulation at the Oxford Internet Institute of the University of Oxford and visiting professor at the New York University School of Law and Stanford Law School. He is the author of The Future of the Internet and How to Stop It as well as co-editor of the books, Access Denied, Access Controlled, and Access Contested.
An Internet forum, or message board, is an online discussion site where people can hold conversations in the form of posted messages. They differ from chat rooms in that messages are often longer than one line of text, and are at least temporarily archived. Also, depending on the access level of a user or the forum set-up, a posted message might need to be approved by a moderator before it becomes publicly visible.
Various anti-spam techniques are used to prevent email spam.
Email spam, also referred to as junk email, spam mail, or simply spam, is unsolicited messages sent in bulk by email (spamming). The name comes from a Monty Python sketch in which the name of the canned pork product Spam is ubiquitous, unavoidable, and repetitive. Email spam has steadily grown since the early 1990s, and by 2014 was estimated to account for around 90% of total email traffic.
Naive Bayes classifiers are a popular statistical technique of e-mail filtering. They typically use bag-of-words features to identify email spam, an approach commonly used in text classification.
Yahoo! Mail is an email service offered by the American company Yahoo, Inc. The service is free for personal use, with an optional monthly fee for additional features. Business email was previously available with the Yahoo! Small Business brand, before it transitioned to Verizon Small Business Essentials in early 2022. Launched on October 8, 1997, as of January 2020, Yahoo! Mail has 225 million users.
The Spamhaus Project is an international organisation based in the Principality of Andorra, founded in 1998 by Steve Linford to track email spammers and spam-related activity. The name spamhaus, a pseudo-German expression, was coined by Linford to refer to an internet service provider, or other firm, which spams or knowingly provides service to spammers.
Online advertising, also known as online marketing, Internet advertising, digital advertising or web advertising, is a form of marketing and advertising that uses the Internet to promote products and services to audiences and platform users. Online advertising includes email marketing, search engine marketing (SEM), social media marketing, many types of display advertising, and mobile advertising. Advertisements are increasingly being delivered via automated software systems operating across multiple websites, media services and platforms, known as programmatic advertising.
SORBS was a list of e-mail servers suspected of sending or relaying spam. It had been augmented with complementary lists that include various other classes of hosts, allowing for customized email rejection by its users.
A wordfilter is a script typically used on Internet forums or chat rooms that automatically scans users' posts or comments as they are submitted and automatically changes or censors particular words or phrases.
Bayesian poisoning is a technique used by e-mail spammers to attempt to degrade the effectiveness of spam filters that rely on Bayesian spam filtering. Bayesian filtering relies on Bayesian probability to determine whether an incoming mail is spam or is not spam. The spammer hopes that the addition of random words that are unlikely to appear in a spam message will cause the spam filter to believe the message to be legitimate—a statistical type II error.
Backscatter is incorrectly automated bounce messages sent by mail servers, typically as a side effect of incoming spam.
The history of email spam reaches back to the mid-1990s when commercial use of the internet first became possible - and marketers and publicists began to test what was possible.
Email spammers have developed a variety of ways to deliver email spam throughout the years, such as mass-creating accounts on services such as Hotmail or using another person's network to send email spam. Many techniques to block, filter, or otherwise remove email spam from inboxes have been developed by internet users, system administrators and internet service providers. Due to this, email spammers have developed their own techniques to send email spam, which are listed below.
Social spam is unwanted spam content appearing on social networking services, social bookmarking sites, and any website with user-generated content. It can be manifested in many ways, including bulk messages, profanity, insults, hate speech, malicious links, fraudulent reviews, fake friends, and personally identifiable information.
{{cite news}}
: CS1 maint: bot: original URL status unknown (link){{cite news}}
: CS1 maint: bot: original URL status unknown (link)