![]() | This article needs to be updated.(August 2021) |
This page is a timeline of machine learning. Major discoveries, achievements, milestones and other major events in machine learning are included.
Decade | Summary |
---|---|
pre- 1950 | Statistical methods are discovered and refined. |
1950s | Pioneering machine learning research is conducted using simple algorithms. |
1960s | Bayesian methods are introduced for probabilistic inference in machine learning. [1] |
1970s | 'AI winter' caused by pessimism about machine learning effectiveness. |
1980s | Rediscovery of backpropagation causes a resurgence in machine learning research. |
1990s | Work on Machine learning shifts from a knowledge-driven approach to a data-driven approach. Scientists begin creating programs for computers to analyze large amounts of data and draw conclusions – or "learn" – from the results. [2] Support-vector machines (SVMs) and recurrent neural networks (RNNs) become popular. [3] The fields of computational complexity via neural networks and super-Turing computation started. [4] |
2000s | Support-Vector Clustering [5] and other kernel methods [6] and unsupervised machine learning methods become widespread. [7] |
2010s | Deep learning becomes feasible, which leads to machine learning becoming integral to many widely used software services and applications. |
![]() |
Year | Event type | Caption | Event |
---|---|---|---|
1763 | Discovery | The Underpinnings of Bayes' Theorem | Thomas Bayes's work An Essay towards solving a Problem in the Doctrine of Chances is published two years after his death, having been amended and edited by a friend of Bayes, Richard Price. [8] The essay presents work which underpins Bayes theorem. |
1805 | Discovery | Least Square | Adrien-Marie Legendre describes the "méthode des moindres carrés", known in English as the least squares method. [9] The least squares method is used widely in data fitting. |
1812 | Bayes' Theorem | Pierre-Simon Laplace publishes Théorie Analytique des Probabilités, in which he expands upon the work of Bayes and defines what is now known as Bayes' Theorem. [10] | |
1913 | Discovery | Markov Chains | Andrey Markov first describes techniques he used to analyse a poem. The techniques later become known as Markov chains. [11] |
1943 | Discovery | Artificial Neuron | Warren McCulloch and Walter Pitts develop a mathematical model that imitates the functioning of a biological neuron, the artificial neuron which is considered to be the first neural model invented. [12] |
1950 | Turing's Learning Machine | Alan Turing proposes a 'learning machine' that could learn and become artificially intelligent. Turing's specific proposal foreshadows genetic algorithms. [13] | |
1951 | First Neural Network Machine | Marvin Minsky and Dean Edmonds build the first neural network machine, able to learn, the SNARC. [14] | |
1952 | Machines Playing Checkers | Arthur Samuel joins IBM's Poughkeepsie Laboratory and begins working on some of the very first machine learning programs, first creating programs that play checkers. [15] | |
1957 | Discovery | Perceptron | Frank Rosenblatt invents the perceptron while working at the Cornell Aeronautical Laboratory. [16] The invention of the perceptron generated a great deal of excitement and was widely covered in the media. [17] |
1963 | Achievement | Machines Playing Tic-Tac-Toe | Donald Michie creates a 'machine' consisting of 304 match boxes and beads, which uses reinforcement learning to play Tic-tac-toe (also known as noughts and crosses). [18] |
1967 | Nearest Neighbor | The nearest neighbour algorithm was created, which is the start of basic pattern recognition. The algorithm was used to map routes. [2] | |
1969 | Limitations of Neural Networks | Marvin Minsky and Seymour Papert publish their book Perceptrons , describing some of the limitations of perceptrons and neural networks. The interpretation that the book shows that neural networks are fundamentally limited is seen as a hindrance for research into neural networks. [19] | |
1970 | Automatic Differentiation (Backpropagation) | Seppo Linnainmaa publishes the general method for automatic differentiation (AD) of discrete connected networks of nested differentiable functions. [20] [21] This corresponds to the modern version of backpropagation, but is not yet named as such. [22] [23] [24] [25] | |
1979 | Stanford Cart | Students at Stanford University develop a cart that can navigate and avoid obstacles in a room. [2] | |
1979 | Discovery | Neocognitron | Kunihiko Fukushima first publishes his work on the neocognitron, a type of artificial neural network (ANN). [26] [27] Neocognition later inspires convolutional neural networks (CNNs). [28] |
1981 | Explanation Based Learning | Gerald Dejong introduces Explanation Based Learning, where a computer algorithm analyses data and creates a general rule it can follow and discard unimportant data. [2] | |
1982 | Discovery | Recurrent Neural Network | John Hopfield popularizes Hopfield networks, a type of recurrent neural network that can serve as content-addressable memory systems. [29] |
1985 | NETtalk | A program that learns to pronounce words the same way a baby does, is developed by Terry Sejnowski. [2] | |
1986 | Application | Backpropagation | Seppo Linnainmaa's reverse mode of automatic differentiation (first applied to neural networks by Paul Werbos) is used in experiments by David Rumelhart, Geoff Hinton and Ronald J. Williams to learn internal representations. [30] |
1988 | Universal approximation theorem | Kurt Hornik proves that standard multilayer feedforward networks are capable of approximating any Borel measurable function from one finite dimensional space to another to any desired degree of accuracy, provided sufficiently many hidden units are available. | |
1989 | Discovery | Reinforcement Learning | Christopher Watkins develops Q-learning, which greatly improves the practicality and feasibility of reinforcement learning. [31] |
1989 | Commercialization | Commercialization of Machine Learning on Personal Computers | Axcelis, Inc. releases Evolver, the first software package to commercialize the use of genetic algorithms on personal computers. [32] |
1992 | Achievement | Machines Playing Backgammon | Gerald Tesauro develops TD-Gammon, a computer backgammon program that uses an artificial neural network trained using temporal-difference learning (hence the 'TD' in the name). TD-Gammon is able to rival, but not consistently surpass, the abilities of top human backgammon players. [33] |
1995 | Discovery | Random Forest Algorithm | Tin Kam Ho publishes a paper describing random decision forests. [34] |
1995 | Discovery | Support-Vector Machines | Corinna Cortes and Vladimir Vapnik publish their work on support-vector machines. [35] |
1997 | Achievement | IBM Deep Blue Beats Kasparov | IBM's Deep Blue beats the world champion at chess. [2] |
1997 | Discovery | LSTM | Sepp Hochreiter and Jürgen Schmidhuber invent long short-term memory (LSTM) recurrent neural networks, [36] greatly improving the efficiency and practicality of recurrent neural networks. |
1998 | MNIST database | A team led by Yann LeCun releases the MNIST database, a dataset comprising a mix of handwritten digits from American Census Bureau employees and American high school students. [37] The MNIST database has since become a benchmark for evaluating handwriting recognition. | |
2002 | Torch Machine Learning Library | Torch, a software library for machine learning, is first released. [38] | |
2006 | The Netflix Prize | The Netflix Prize competition is launched by Netflix. The aim of the competition was to use machine learning to beat Netflix's own recommendation software's accuracy in predicting a user's rating for a film given their ratings for previous films by at least 10%. [39] The prize was won in 2009. | |
2009 | Achievement | ImageNet | ImageNet is created. ImageNet is a large visual database envisioned by Fei-Fei Li from Stanford University, who realized that the best machine learning algorithms wouldn't work well if the data didn't reflect the real world. [40] For many, ImageNet was the catalyst for the AI boom [41] of the 21st century. |
2010 | Kaggle Competition | Kaggle, a website that serves as a platform for machine learning competitions, is launched. [42] | |
2011 | Achievement | Beating Humans in Jeopardy | Using a combination of machine learning, natural language processing and information retrieval techniques, IBM's Watson beats two human champions in a Jeopardy! competition. [43] |
2012 | Achievement | Recognizing Cats on YouTube | The Google Brain team, led by Andrew Ng and Jeff Dean, create a neural network that learns to recognize cats by watching unlabeled images taken from frames of YouTube videos. [44] [45] |
2014 | Leap in Face Recognition | Facebook researchers publish their work on DeepFace, a system that uses neural networks that identifies faces with 97.35% accuracy. The results are an improvement of more than 27% over previous systems and rivals human performance. [46] | |
2014 | Sibyl | Researchers from Google detail their work on Sibyl, [47] a proprietary platform for massively parallel machine learning used internally by Google to make predictions about user behavior and provide recommendations. [48] | |
2016 | Achievement | Beating Humans in Go | Google's AlphaGo program becomes the first Computer Go program to beat an unhandicapped professional human player [49] using a combination of machine learning and tree search techniques. [50] Later improved as AlphaGo Zero and then in 2017 generalized to Chess and more two-player games with AlphaZero. |
2017 | Discovery | Transformer | A team at Google Brain invent the transformer architecture, [51] which allows for faster parallel training of neural networks on sequential data like text. |
2018 | Achievement | Protein Structure Prediction | AlphaFold 1 (2018) placed first in the overall rankings of the 13th Critical Assessment of Techniques for Protein Structure Prediction (CASP) in December 2018. [52] |
2021 | Achievement | Protein Structure Prediction | AlphaFold 2 (2021), A team that used AlphaFold 2 (2020) repeated the placement in the CASP competition in November 2020. The team achieved a level of accuracy much higher than any other group. It scored above 90 for around two-thirds of the proteins in CASP's global distance test (GDT), a test that measures the degree to which a computational program predicted structure is similar to the lab experiment determined structure, with 100 being a complete match, within the distance cutoff used for calculating GDT. [53] |
Artificial intelligence (AI) is the intelligence of machines or software, as opposed to the intelligence of human beings or animals. AI applications include advanced web search engines, recommendation systems, understanding human speech, self-driving cars, generative or creative tools, and competing at the highest level in strategic games.
Artificial neural networks are a branch of machine learning models that are built using principles of neuronal organization discovered by connectionism in the biological neural networks constituting animal brains.
Hypercomputation or super-Turing computation is a set of models of computation that can provide outputs that are not Turing-computable. For example, a machine that could solve the halting problem would be a hypercomputer; so too would one that can correctly evaluate every statement in Peano arithmetic.
Machine learning (ML) is an umbrella term for solving problems for which development of algorithms by human programmers would be cost-prohibitive, and instead the problems are solved by helping machines 'discover' their 'own' algorithms, without needing to be explicitly told what to do by any human-developed algorithms. Recently, generative artificial neural networks have been able to surpass results of many previous approaches. Machine learning approaches have been applied to large language models, computer vision, speech recognition, email filtering, agriculture and medicine, where it is too costly to develop algorithms to perform the needed tasks.
Jürgen Schmidhuber is a German computer scientist noted for his work in the field of artificial intelligence, specifically artificial neural networks. He is a scientific director of the Dalle Molle Institute for Artificial Intelligence Research in Switzerland. He is also director of the Artificial Intelligence Initiative and professor of the Computer Science program in the Computer, Electrical, and Mathematical Sciences and Engineering (CEMSE) division at the King Abdullah University of Science and Technology (KAUST) in Saudi Arabia.
Neuroevolution, or neuro-evolution, is a form of artificial intelligence that uses evolutionary algorithms to generate artificial neural networks (ANN), parameters, and rules. It is most commonly applied in artificial life, general game playing and evolutionary robotics. The main benefit is that neuroevolution can be applied more widely than supervised learning algorithms, which require a syllabus of correct input-output pairs. In contrast, neuroevolution requires only a measure of a network's performance at a task. For example, the outcome of a game can be easily measured without providing labeled examples of desired strategies. Neuroevolution is commonly used as part of the reinforcement learning paradigm, and it can be contrasted with conventional deep learning techniques that use gradient descent on a neural network with a fixed topology.
Neuromorphic computing is an approach to computing that is inspired by the structure and function of the human brain. A neuromorphic computer/chip is any device that uses physical artificial neurons to do computations. In recent times, the term neuromorphic has been used to describe analog, digital, mixed-mode analog/digital VLSI, and software systems that implement models of neural systems. The implementation of neuromorphic computing on the hardware level can be realized by oxide-based memristors, spintronic memories, threshold switches, transistors, among others. Training software-based neuromorphic systems of spiking neural networks can be achieved using error backpropagation, e.g., using Python based frameworks such as snnTorch, or using canonical learning rules from the biological learning literature, e.g., using BindsNet.
A cognitive architecture refers to both a theory about the structure of the human mind and to a computational instantiation of such a theory used in the fields of artificial intelligence (AI) and computational cognitive science. The formalized models can be used to further refine a comprehensive theory of cognition and as a useful artificial intelligence program. Successful cognitive architectures include ACT-R and SOAR. The research on cognitive architectures as software instantiation of cognitive theories was initiated by Allen Newell in 1990.
A recurrent neural network (RNN) is one of the two broad types of artificial neural network, characterized by direction of the flow of information between its layers. In contrast to uni-directional feedforward neural network, it is a bi-directional artificial neural network, meaning that it allows the output from some nodes to affect subsequent input to the same nodes. Their ability to use internal state (memory) to process arbitrary sequences of inputs makes them applicable to tasks such as unsegmented, connected handwriting recognition or speech recognition. The term "recurrent neural network" is used to refer to the class of networks with an infinite impulse response, whereas "convolutional neural network" refers to the class of finite impulse response. Both classes of networks exhibit temporal dynamic behavior. A finite impulse recurrent network is a directed acyclic graph that can be unrolled and replaced with a strictly feedforward neural network, while an infinite impulse recurrent network is a directed cyclic graph that can not be unrolled.
A neural network can refer to either a neural circuit of biological neurons, or a network of artificial neurons or nodes in the case of an artificial neural network. Artificial neural networks are used for solving artificial intelligence (AI) problems; they model connections of biological neurons as weights between nodes. A positive weight reflects an excitatory connection, while negative values mean inhibitory connections. All inputs are modified by a weight and summed. This activity is referred to as a linear combination. Finally, an activation function controls the amplitude of the output. For example, an acceptable range of output is usually between 0 and 1, or it could be −1 and 1.
Long short-term memory (LSTM) network is a recurrent neural network (RNN), aimed to deal with the vanishing gradient problem present in traditional RNNs. Its relative insensitivity to gap length is its advantage over other RNNs, hidden Markov models and other sequence learning methods. It aims to provide a short-term memory for RNN that can last thousands of timesteps, thus "long short-term memory". It is applicable to classification, processing and predicting data based on time series, such as in handwriting, speech recognition, machine translation, speech activity detection, robot control, video games, and healthcare.
This is a timeline of artificial intelligence, sometimes alternatively called synthetic intelligence.
Hava Siegelmann is an American computer scientist and Provost Professor at the University of Massachusetts Amherst.
There are many types of artificial neural networks (ANN).
Josef "Sepp" Hochreiter is a German computer scientist. Since 2018 he has led the Institute for Machine Learning at the Johannes Kepler University of Linz after having led the Institute of Bioinformatics from 2006 to 2018. In 2017 he became the head of the Linz Institute of Technology (LIT) AI Lab. Hochreiter is also a founding director of the Institute of Advanced Research in Artificial Intelligence (IARAI). Previously, he was at the Technical University of Berlin, at the University of Colorado at Boulder, and at the Technical University of Munich. He is a chair of the Critical Assessment of Massive Data Analysis (CAMDA) conference.
Deep learning is part of a broader family of machine learning methods, which is based on artificial neural networks with representation learning. The adjective "deep" in deep learning refers to the use of multiple layers in the network. Methods used can be either supervised, semi-supervised or unsupervised.
The MNIST database is a large database of handwritten digits that is commonly used for training various image processing systems. The database is also widely used for training and testing in the field of machine learning. It was created by "re-mixing" the samples from NIST's original datasets. The creators felt that since NIST's training dataset was taken from American Census Bureau employees, while the testing dataset was taken from American high school students, it was not well-suited for machine learning experiments. Furthermore, the black and white images from NIST were normalized to fit into a 28x28 pixel bounding box and anti-aliased, which introduced grayscale levels.
AlexNet is the name of a convolutional neural network (CNN) architecture, designed by Alex Krizhevsky in collaboration with Ilya Sutskever and Geoffrey Hinton, who was Krizhevsky's Ph.D. advisor.
In machine learning, the Highway Network was the first working very deep feedforward neural network with hundreds of layers, much deeper than previous artificial neural networks. It uses skip connections modulated by learned gating mechanisms to regulate information flow, inspired by Long Short-Term Memory (LSTM) recurrent neural networks. The advantage of a Highway Network over the common deep neural networks is that it solves or partially prevents the vanishing gradient problem, thus leading to easier to optimize neural networks. The gating mechanisms facilitate information flow across many layers.
Delving into the text of Alexander Pushkin's novel in verse Eugene Onegin, Markov spent hours sifting through patterns of vowels and consonants. On January 23, 1913, he summarized his findings in an address to the Imperial Academy of Sciences in St. Petersburg. His analysis did not alter the understanding or appreciation of Pushkin's poem, but the technique he developed—now known as a Markov chain—extended the theory of probability in a new direction.
{{cite journal}}
: Cite journal requires |journal=
(help){{cite journal}}
: Cite journal requires |journal=
(help){{cite journal}}
: Cite journal requires |journal=
(help){{cite journal}}
: Cite journal requires |journal=
(help)