Gary Robinson | |
---|---|
Born | |
Education | Bard College;Courant Institute [1] |
Occupation | Computer programmer |
Employer | Emergent Music LLC [1] |
Known for | SpamBayes, SpamAssassin, Recommendation engine, Collaborative filtering |
Title | Chief Technology officer [1] |
Website | GaryRobinson.net |
Gary Robinson is an American software engineer and mathematician [2] and inventor notable for his mathematical algorithms to fight spam. [3] In addition, he patented a method to use web browser cookies to track consumers across different web sites, allowing marketers to better match advertisements with consumers. [4] [5] The patent was bought by DoubleClick, and then DoubleClick was bought by Google. [6] [7] He is credited as being one of the first to use automated collaborative filtering technologies to turn word-of-mouth recommendations into useful data. [2]
In 2003, Robinson's article in Linux Journal detailed a new approach to computer programming perhaps best described as a general purpose classifier which expanded on the usefulness of Bayesian filtering. Robinson's method used math-intensive algorithms combined with Chi-square statistical testing to enable computers to examine an unknown file and make intelligent guesses about what was in it. [8] The technique had wide applicability; for example, Robinson's method enabled computers to examine a file and guess, with much greater accuracy, whether it contained pornography, or whether an incoming email to a corporation was a technical question or a sales-related question. [9] The method became the basis for anti-spam techniques used by Tim Peters and Rob Hooft of the influential SpamBayes project. [10] [11] Spamming is the abuse of electronic messaging systems to send unsolicited, undesired bulk messages. [12] SpamBayes assigned probability scores to both spam and ham (useful emails) to guess intelligently whether an incoming email was spam; the scoring system enabled the program to return a value of unsure if both the spam and ham scores were high. [8] Robinson's method was used in other anti-spam projects such as SpamAssassin. [13] [14] [15] Robinson commented in Linux Journal on how fighting spam was a collaborative effort:
The approach described here truly has been a distributed effort in the best open-source tradition. Paul Graham, an author of books on Lisp, suggested an approach to filtering spam in his on-line article, "A Plan for Spam". I took his approach for generating probabilities associated with words, altered it slightly and proposed a Bayesian calculation for dealing with words that hadn't appeared very often ... an approach based on the chi-square distribution for combining the individual word probabilities into a combined probability (actually a pair of probabilities—see below) representing an e-mail. Finally, Tim Peters of the Spambayes Project proposed a way of generating a particularly useful spamminess indicator based on the combined probabilities. All along the way the work was guided by ongoing testing of embodiments written in Python by Tim Peters for Spambayes and in C by Greg Louis of the Bogofilter Project. The testing was done by a number of people involved with those projects.
— Gary Robinson, 2003. [11]
In 1996, Robinson patented a method to help marketers focus their online advertisements to consumers. He explained:
As far as I have been able to tell, it's the very first patent ... to mention using web browser cookies to track consumers across different web sites and build a profile of their interests in order to determine what ads to show them ... There was an aspect in the way browser cookies were implemented that allowed them to be used ... I hired programmers to do the programming to actually test it ... the hypothesis turned out to be correct.
— Gary B. Robinson, 2014
In 2010, Robinson was the chief technology officer at FlyFi, an online music service owned by Maine-based [16] Emergent Discovery which uses his anti-spam programming techniques along with collaborative filtering technologies to help make music recommendations to web users. [17] [18] His blog Gary Robinson's Rants has been quoted by others in the computer and online music industries [17] and cited by academic papers. [12] [19] [20] Robinson helped develop recommendation engine technology which applies high-power mathematical techniques using software algorithms to have a computer guess intelligently about what a consumer might like. [21] For example, if a consumer likes music by artists such as the Beach Boys, Bob Dylan and Talking Heads, the computer software will match these preferences with a much larger dataset of other consumers who also like those three artists but which cumulatively has much greater musical knowledge than the single consumer. Accordingly, the computer will find music that the user might like but hasn't been exposed to, and therefore hopefully offer intelligent recommendations, in a process which has come to be called knowledge management. [2] But the mathematics behind such comparisons can become quite complex and involved. Robinson studied mathematics at Bard College and graduated in 1979 and studied further at the Courant Institute of New York University. [1] In the 1980s, Robinson worked on an entrepreneurial start-up dating service called 212-Romance which used similar computer algorithms to match singles romantically. [2] [22] The New York City-based voice mail dating service created community-based automated recommendations and used collaborative filtering technologies which Robinson developed further in other capacities.
Bayesian inference is a method of statistical inference in which Bayes' theorem is used to update the probability for a hypothesis as more evidence or information becomes available. Fundamentally, Bayesian inference uses prior knowledge, in the form of a prior distribution in order to estimate posterior probabilities. Bayesian inference is an important technique in statistics, and especially in mathematical statistics. Bayesian updating is particularly important in the dynamic analysis of a sequence of data. Bayesian inference has found application in a wide range of activities, including science, engineering, philosophy, medicine, sport, and law. In the philosophy of decision theory, Bayesian inference is closely related to subjective probability, often called "Bayesian probability".
In statistics, naive Bayes classifiers are a family of linear "probabilistic classifiers" which assumes that the features are conditionally independent, given the target class. The strength (naivety) of this assumption is what gives the classifier its name. These classifiers are among the simplest Bayesian network models.
Pattern recognition is the task of assigning a class to an observation based on patterns extracted from data. While similar, pattern recognition (PR) is not to be confused with pattern machines (PM) which may possess (PR) capabilities but their primary function is to distinguish and create emergent patterns. PR has applications in statistical data analysis, signal processing, image analysis, information retrieval, bioinformatics, data compression, computer graphics and machine learning. Pattern recognition has its origins in statistics and engineering; some modern approaches to pattern recognition include the use of machine learning, due to the increased availability of big data and a new abundance of processing power.
Bogofilter is a mail filter that classifies e-mail as spam or ham (non-spam) by a statistical analysis of the message's header and content (body). The program is able to learn from the user's classifications and corrections. It was originally written by Eric S. Raymond after he read Paul Graham's article "A Plan for Spam" and is now maintained together with a group of contributors by David Relson, Matthias Andree and Greg Louis.
Apache SpamAssassin is a computer program used for e-mail spam filtering. It uses a variety of spam-detection techniques, including DNS and fuzzy checksum techniques, Bayesian filtering, external programs, blacklists and online databases. It is released under the Apache License 2.0 and is a part of the Apache Foundation since 2004.
Various anti-spam techniques are used to prevent email spam.
Naive Bayes classifiers are a popular statistical technique of e-mail filtering. They typically use bag-of-words features to identify email spam, an approach commonly used in text classification.
A recommender system, or a recommendation system, is a subclass of information filtering system that provides suggestions for items that are most pertinent to a particular user. Recommender systems are particularly useful when an individual needs to choose an item from a potentially overwhelming number of items that a service may offer.
When classification is performed by a computer, statistical methods are normally used to develop the algorithm.
The Courant Institute of Mathematical Sciences is the mathematics research school of New York University (NYU). Founded in 1935, it is named after Richard Courant, one of the founders of the Courant Institute and also a mathematics professor at New York University from 1936 to 1972, and serves as a center for research and advanced training in computer science and mathematics. It is located on Gould Plaza next to the Stern School of Business and the economics department of the College of Arts and Science.
In probability theory, statistics, and machine learning, recursive Bayesian estimation, also known as a Bayes filter, is a general probabilistic approach for estimating an unknown probability density function (PDF) recursively over time using incoming measurements and a mathematical process model. The process relies heavily upon mathematical concepts and models that are theorized within a study of prior and posterior probabilities known as Bayesian statistics.
SpamBayes is a Bayesian spam filter written in Python which uses techniques laid out by Paul Graham in his essay "A Plan for Spam". It has subsequently been improved by Gary Robinson and Tim Peters, among others.
Markovian discrimination is a class of spam filtering methods used in CRM114 and other spam filters to filter based on statistical patterns of transition probabilities between words or other lexical tokens in spam messages that would not be captured using simple bag-of-words naive Bayes spam filtering.
Bayesian poisoning is a technique used by e-mail spammers to attempt to degrade the effectiveness of spam filters that rely on Bayesian spam filtering. Bayesian filtering relies on Bayesian probability to determine whether an incoming mail is spam or is not spam. The spammer hopes that the addition of random words that are unlikely to appear in a spam message will cause the spam filter to believe the message to be legitimate—a statistical type II error.
Bronxville Union Free School District is a public school district serving the Village of Bronxville, Westchester County, New York. In 2023, 1603 students were enrolled across the district's elementary, middle and high schools. Bronxville High School was named the 10th best public high school in New York State for 2024.
The history of email spam reaches back to the mid-1990s when commercial use of the internet first became possible - and marketers and publicists began to test what was possible.
Bayesian programming is a formalism and a methodology for having a technique to specify probabilistic models and solve problems when less than the necessary information is available.
The following outline is provided as an overview of and topical guide to machine learning:
I make the music recommendation technology at FlyFi — Where I grew up Bronxville, NY — Companies I've worked for Athenium, OLI Systems, Lambda Technology — Schools I've attended Bard College; Courant Institute of Mathematical Sciences
Gary Robinson provided a lot of the serious maths and theory, as well as his essay on "how to do it better" (see the background page for a link).
G. Robinson, "Spam Detection", [online] 2002, ... G. Robinson, "Instructions for Training to Exhaustion", (Gary' Longer Rants), [online] 2004, (see page 8)
This article discusses one of many possible mathematical foundations for a key aspect of spam filtering—generating an indicator of "spamminess" from a collection of tokens representing the content of an e-mail.
Gary Robinson proposes an improved method for calculating the word value of a token W. His method modifies Graham's by adding a confidence factor to scale the word value by the amount of historical data that is available for the token. Let N be ...
Gary Robinson's f(x) and combining algorithms, as used in SpamAssassin
Algorithms: The Bayesian-style text classifier used by SpamAssassin's BAYES rules is based on an approach outlined by Gary Robinson. Thanks, Gary!
Gary Robinson's f(x) and combining algorithms, as used in SpamAssassin
Emergent Discovery — 565 Congress Street — Suite 201 —Portland, ME 04101
Gary Robinson, the head of Emergent Music has an article on his blog about the Three Steps To Freedom. His opinion on this definitely counts, because EM might very well be the future of music. I'm going to chime in with my thoughts here and copy them over to EM's forum as well.
Gary Robinson, CTO, is both a musician and leader in the "recommendation engine" field. Gary's background reflects his pioneering work in mathematics, technology and collaborative filtering.
So, as a "thought experiment," I have imagined the following path to creating an alternative music industry.
The FlyFi iTunes Helper sends the contents of your iTunes data file (a behind the scenes part of your iTunes library) to FlyFi server to be analyzed. By looking at your iTunes music, which is one of the best reflections of your musical tastes, FlyFi can make better new music suggestion. FlyFi can also use this information to better serve other members.
Gary Robinson, CTO, is a leader in the "recommendation engine" field. Gary's background reflects his pioneering work in mathematics, technology and collaborative filtering. For instance, as a Research Director at ActiveState, Gary's work on spam detection is now being widely adopted by the anti-spam industry, including such leading filters as SpamAssassin (PC Magazine's Editor's Choice for spam filtering), SpamSieve (MacWorld's Software of the Year) and SpamBayes (PC World's Editor's Choice for spam filtering).
(ad for 212-Romance on left side of page)