Breidbart Index

Last updated

The Breidbart Index, developed by Seth Breidbart, is the most significant cancel index in Usenet.

Contents

A cancel index measures the dissemination intensity of substantively identical articles. If the index exceeds a threshold the articles are called newsgroup spam. They can then be removed using third party cancel controls.

Cancel Index

The principal idea of the Breidbart-Index is to give these methods different weight. [1] With a crossposted message less data needs to be transferred and stored. And excessive crossposts (ECP) are also a likely beginner's error, while excessive multiposts (EMP) suggest deliberate usage of special software.

The crucial issue is categorizing multiple articles as substantively identical. [2] This includes

Breidbart Index (BI)

The Breidbart Index of a set of articles is defined as the sum of the square root of n, where n is the number of newsgroups to which an article is cross posted.

Formula

Example

Two copies of a posting are made, one to 9 groups, and one to 16.

Breidbart-Index, Version 2 (BI2)

A more aggressive criterion, Breidbart Index Version 2, has been proposed. The BI2 is defined as the sum of the square root of n, plus the sum of n, divided by two. [2] A single message would only need to be crossposted to 35 newsgroups to breach the threshold of 20.

Formula

Example

Two copies of a posting are made, one to 9 groups, and one to 16.

Skirvin-Breidbart Index (SBI, BI3)

The name Skirvin-Breidbart Index and the abbreviation SBI are mentioned in the Spam Thresholds FAQ. [2] However, in hierarchy nl.* this index is called BI3. [3]

The SBI is calculated similar to the BI2 but adds up the number of groups in Followup-to: (if present) instead of the number of groups in Newsgroups:. This encourages the use of Followup-to:.

Example

Two posts contain the same text. One is crossposted to 9 groups. The other is crossposted to 16, with four groups in Followup-to:.

BI7 and BI30

In hierarchy de.* the Breidbart index is used with a time range of seven days instead of 45. This is denoted by the abbreviation BI7. [4]

In hierarchy hamster.de.* the Breidbart index is used with a time range of 30 days instead of 45. This is denoted by the abbreviation BI30. [5]

Cancel Index in at.*

This is defined in the FAQ [6] of the group at.usenet.cancel-reports. The term used in the Call for Votes [7] and in the FAQ is "Cancel-Index". Unofficial abbreviations are CI and ACI.

The ACI of a single post equals 3 plus the number of groups that the post was sent to. The index of multiple posts is the sum of the indices of the individual posts.

Thresholds

In fact a cancel message is a just a non-binding request to remove a certain article. News server operators can freely decide on how to implement the conflicting policies. [9]

HierarchyCancel IndexTime RangeLimitReportsDefinition
*BI4520news.admin.net-abuse.bulletins [2]
at.*ACI4511at.usenet.cancel-reports [7] [10]
at.anzeigen.*ACI144at.usenet.cancel-reports [11] [12]
bln.*BI453 [13]
de.*BI75de.admin.net-abuse.announce [4] [14]
de.alt.dateien.*BI451de.admin.net-abuse.announce [4] [13]
de.markt.*BI452de.admin.net-abuse.announce [4] [13] [15]
es.*BI4520 [16]
fr.*BI304fr.usenet.abus.rapports [17] [18]
hamster.de.*BI303hamster.cancelreport [5]
it.*BI4520it.news.net-abuse [19] [20]
muc.*BI452 [13]
nl.*SBI10nl.internet.misbruik.rapport [3]
nrw.*BI72 [21]
schule.*BI143schule.cancelreport [22]

Related Research Articles

A frequently asked questions (FAQ) list is often used in articles, websites, email lists, and online forums where common questions tend to recur, for example through posts or queries by new users related to common knowledge gaps. The purpose of a FAQ is generally to provide information on frequent questions or concerns; however, the format is a useful means of organizing information, and text consisting of questions and their answers may thus be called a FAQ regardless of whether the questions are actually frequently asked.

<span class="mw-page-title-main">Geometric mean</span> N-th root of the product of n numbers

In mathematics, the geometric mean is a mean or average which indicates a central tendency of a finite set of real numbers by using the product of their values. The geometric mean is defined as the nth root of the product of n numbers, i.e., for a set of numbers a1, a2, ..., an, the geometric mean is defined as

The Usenet newsgroup alt.religion.scientology started in 1991 to discuss the controversial beliefs of Scientology, as well as the activities of the Church of Scientology, which claims exclusive intellectual property rights thereto and is viewed by many as a dangerous cult. The newsgroup has become the focal point of an aggressive battle known as Scientology versus the Internet, which has taken place both online and in the courts.

<span class="mw-page-title-main">Sporgery</span> Posting a flood of articles to a Usenet group, with falsified headers.

Sporgery is the disruptive act of posting a flood of articles to a Usenet newsgroup, with the article headers falsified so that they appear to have been posted by others. The word is a portmanteau of spam and forgery, coined by German software developer, and critic of Scientology, Tilman Hausherr.

alt.sex is a Usenet newsgroup – a discussion group within the Usenet network – relating to human sexual activity. It was popular in the 1990s. An October 1993 survey by Brian Reid reported an estimated worldwide readership for the alt.sex newsgroup of 3.3 million, that being 8% of the total Usenet readership, with 67% of all Usenet "nodes" carrying the group. At that time, alt.sex had an estimated traffic of 2,300 messages per month.

Newsgroup spam is a type of spam where the targets are Usenet newsgroups.

On Usenet, the Usenet Death Penalty (UDP) is a final penalty that may be issued against Internet service providers or single users who produce too much spam or fail to adhere to Usenet standards. It is named after the death penalty, as it causes the banned user or provider to be unable to use Usenet, essentially "killing" their service. Messages that fall under the jurisdiction of a Usenet Death Penalty will be cancelled. Cancelled messages are deleted from Usenet servers and not allowed to propagate. This causes users on the affected ISP to be unable to post to Usenet, and it puts pressure on the ISP to change their policies. Notable cases include actions taken against UUNET, CompuServe, and Excite@Home.

Crossposting is the act of posting the same message to multiple information channels; forums, mailing lists, or newsgroups. This is distinct from multiposting, which is the posting of separate identical messages, individually, to each channel,. Enforcement actions against crossposting individuals vary from simple admonishments up to total lifetime bans. In some cases, on email lists and forums, an individual is put under a stealth ban where their posts are distributed back to them as if they were being distributed normally, but the rest of the subscribers are not sent the messages. This is easily detected if the Stealthed individual has two different, and totally non-associated identities in the channel, such that the non-stealthed identity will see a different set of messages, lacking the posts of the stealthed individual, in their view of the channel.

In mathematics, the root test is a criterion for the convergence of an infinite series. It depends on the quantity

Usenet II was a proposed alternative to the classic Usenet hierarchy, started in 1998. Unlike the original Usenet, it was peered only between "sound sites" and employed a system of rules to keep out spam.

A cancelbot is an automated or semi-automated process for sending out third-party cancel messages over Usenet, commonly as a stopgap measure to combat spam.

Chris Lewis is a Canadian security consultant from Ottawa, who fought spam on Usenet and the early Internet. Active in volunteer anti-spam efforts in the late 1990s and early 2000s, Lewis was described in Net.wars (1997) as "the best known active canceler of spam and other mass postings" at the time. In April 1998, he organized an unsuccessful moratorium with forty other anti-spam volunteers in an attempt to boycott internet service providers into doing their share against spam. He worked as a systems architect for Nortel and, as of 2017, is Chief Scientist at SpamhausTechnology.

Control messages are a special kind of Usenet post that are used to control news servers. They differ from ordinary posts by a header field named Control. The body of the field contains control name and arguments.

An index of qualitative variation (IQV) is a measure of statistical dispersion in nominal distributions. There are a variety of these, but they have been relatively little-studied in the statistics literature. The simplest is the variation ratio, while more complex indices include the information entropy.

A Usenet personality was a particular kind of Internet celebrity, being an individual who gained a certain level of notoriety from posting on Usenet, a global network of computer users with a vast array of topics for discussion. The platform is usually anonymous, although users can get celebrity status, usually by being deemed different from other posters in some way.

<span class="mw-page-title-main">Usenet</span> Worldwide computer-based distributed discussion system

Usenet, USENET, or "in full", User's Network, is a worldwide distributed discussion system available on computers. It was developed from the general-purpose Unix-to-Unix Copy (UUCP) dial-up network architecture. Tom Truscott and Jim Ellis conceived the idea in 1979, and it was established in 1980. Users read and post messages to one or more topic categories, known as newsgroups. Usenet resembles a bulletin board system (BBS) in many respects and is the precursor to the Internet forums that have become widely used. Discussions are threaded, as with web forums and BBSes, though posts are stored on the server sequentially.

<span class="mw-page-title-main">BIRCH</span> Clustering using tree-based data aggregation

BIRCH is an unsupervised data mining algorithm used to perform hierarchical clustering over particularly large data-sets. With modifications it can also be used to accelerate k-means clustering and Gaussian mixture modeling with the expectation–maximization algorithm. An advantage of BIRCH is its ability to incrementally and dynamically cluster incoming, multi-dimensional metric data points in an attempt to produce the best quality clustering for a given set of resources. In most cases, BIRCH only requires a single scan of the database.

Dave the Resurrector was a so-called "resurrector bot" that responded to any attempts at canceling a message on the usenet newsgroup news.admin.net-abuse by re-posting the message. It was written by Chris Lewis.

The Fowlkes–Mallows index is an external evaluation method that is used to determine the similarity between two clusterings, and also a metric to measure confusion matrices. This measure of similarity could be either between two hierarchical clusterings or a clustering and a benchmark classification. A higher value for the Fowlkes–Mallows index indicates a greater similarity between the clusters and the benchmark classifications. It was invented by Bell Labs statisticians Edward Fowlkes and Collin Mallows in 1983.

The Meow Wars were an early example of a flame war sent over Usenet which began in 1996 and ended circa 1998. Its participants were known as "Meowers". The war was characterized by posters from one newsgroup "crapflooding", or posting a large volume of nonsense messages, to swamp on-topic communication in other groups. Ultimately, the flame war affected many boards, with Roisin Kiberd writing in Motherboard, a division of Vice, that esoteric Internet vocabulary was created as a result of the Meow Wars.

References

  1. <news:36i6hk$3li@panix3.panix.com>
  2. 1 2 3 4 5 Spam Thresholds FAQ
  3. 1 2 "Over cancelrapporten in nl.internet.misbruik". Archived from the original on 2007-08-04. Retrieved 2009-08-14.
  4. 1 2 3 4 <news:Result-festlegung-bi-06-05-1999@dana.de>, http://home.snafu.de/hweede/debi.txt
  5. 1 2 Regelsatz für hamster.de.* Archived 2013-02-10 at archive.today
  6. Spam Cancel in at.* (FAQ)
  7. 1 2 <news:CfV$1$at.usenet.cancel-reports$3@usenet.backbone.at>
  8. free.* FAQ
  9. Cancel Messages: Frequently Asked Questions
  10. <news:Result-festlegung-bi-06-05-1999@dana.de>
  11. <news:938889391/18913@usenet.backbone.at>
  12. <news:RESULT-Entfernung_von_at.anzeigen.computer.haendler-03.10.1999@orcus.priv.at>
  13. 1 2 3 4 Die Jobbörse in den Newsgroups: de.markt.arbeit.*
  14. "Fremdcancel-FAQ". Archived from the original on 2007-06-25. Retrieved 2009-08-14.
  15. "Archived copy". Archived from the original on 2011-01-30. Retrieved 2009-08-14.{{cite web}}: CS1 maint: archived copy as title (link), <Result-de.markt.ALL-28.05.1998@dana.de>
  16. FAQ: SPAM.ES Archived 2010-05-03 at the Wayback Machine
  17. Les annulations d'articles (Cancel)
  18. Changement des règles d'annulation
  19. Tutto quello che occorre sapere sulla cancellazione
  20. Pagina Antispam in italiano
  21. Regeln der nrw.*-Hierarchie
  22. "Netiquette.txt". November 13, 2003. Archived from the original (TXT) on September 30, 2022. Retrieved December 22, 2023.