Timeline of machine learning

Last updated April 18, 2025

This page is a timeline of machine learning. Major discoveries, achievements, milestones and other major events in machine learning are included.

Overview

Decade	Summary
pre- 1950	Statistical methods are discovered and refined.
1950s	Pioneering machine learning research is conducted using simple algorithms.
1960s	Bayesian methods are introduced for probabilistic inference in machine learning.^[1]
1970s	'AI winter' caused by pessimism about machine learning effectiveness.
1980s	Rediscovery of backpropagation causes a resurgence in machine learning research.
1990s	Work on Machine learning shifts from a knowledge-driven approach to a data-driven approach. Scientists begin creating programs for computers to analyze large amounts of data and draw conclusions – or "learn" – from the results.^[2] Support-vector machines (SVMs) and recurrent neural networks (RNNs) become popular.^[3] The fields of computational complexity via neural networks and super-Turing computation started.^[4]
2000s	Support-Vector Clustering^[5] and other kernel methods ^[6] and unsupervised machine learning methods become widespread.^[7]
2010s	Deep learning becomes feasible, which leads to machine learning becoming integral to many widely used software services and applications. Deep learning spurs huge advances in vision and text processing.
2020s	Generative AI leads to revolutionary models, creating a proliferation of foundation models both proprietary and open source, notably enabling products such as ChatGPT (text-based) and Stable Diffusion (image based). Machine learning and AI enter the wider public consciousness. The commercial potential of AI based on machine learning causes large increases in valuations of companies linked to AI.

Timeline

Year	Event type	Caption	Event
1763	Discovery	The Underpinnings of Bayes' Theorem	Thomas Bayes's work An Essay Towards Solving a Problem in the Doctrine of Chances is published two years after his death, having been amended and edited by a friend of Bayes, Richard Price.^[8] The essay presents work which underpins Bayes' theorem.
1805	Discovery	Least Square	Adrien-Marie Legendre describes the "méthode des moindres carrés", known in English as the least squares method.^[9] The least squares method is used widely in data fitting.
1812		Bayes' Theorem	Pierre-Simon Laplace publishes Théorie Analytique des Probabilités, in which he expands upon the work of Bayes and defines what is now known as Bayes' Theorem.^[10]
1843	Visionary	Visionary Pioneer	Ada Lovelace Lovelace's most significant relationship was with Charles Babbage, the inventor of the Analytical Engine, which is considered the first conceptual blueprint for a modern computer. ^[11] Lovelace's vision extended beyond Babbage's own understanding of his machine. She saw the Analytical Engine as more than a calculator; she believed it could process and manipulate any form of symbolic data, such as music or text. This early vision of machines processing more than just numbers laid the groundwork for the development of symbolic AI and machine learning. ^[12] Her contributions included what is now considered the first algorithm designed to be executed by a machine, making her the world's first computer programmer. Lovelace's understanding of the computational potential of machines continues to influence modern technologies like artificial intelligence. ^[13]
1913	Discovery	Markov Chains	Andrey Markov first describes techniques he used to analyse a poem. The techniques later become known as Markov chains.^[14]
1943	Discovery	Artificial Neuron	Warren McCulloch and Walter Pitts develop a mathematical model that imitates the functioning of a biological neuron, the artificial neuron which is considered to be the first neural model invented.^[15]
1950		Turing's Learning Machine	Alan Turing proposes a 'learning machine' that could learn and become artificially intelligent. Turing's specific proposal foreshadows genetic algorithms.^[16]
1951		First Neural Network Machine	Marvin Minsky and Dean Edmonds build the first neural network machine, able to learn, the SNARC.^[17]
1952		Machines Playing Checkers	Arthur Samuel joins IBM's Poughkeepsie Laboratory and begins working on some of the first machine learning programs, first creating programs that play checkers.^[18]
1957	Discovery	Perceptron	Frank Rosenblatt invents the perceptron while working at the Cornell Aeronautical Laboratory.^[19] The invention of the perceptron generated a great deal of excitement and was widely covered in the media.^[20]
1963	Achievement	Machines Playing Tic-Tac-Toe	Donald Michie creates a 'machine' consisting of 304 match boxes and beads, which uses reinforcement learning to play Tic-tac-toe (also known as noughts and crosses).^[21]
1967		Nearest Neighbor	The nearest neighbour algorithm was created, which is the start of basic pattern recognition. The algorithm was used to map routes.^[2]
1969		Limitations of Neural Networks	Marvin Minsky and Seymour Papert publish their book Perceptrons , describing some of the limitations of perceptrons and neural networks. The interpretation that the book shows that neural networks are fundamentally limited is seen as a hindrance for research into neural networks.^[22]
1970		Automatic Differentiation (Backpropagation)	Seppo Linnainmaa publishes the general method for automatic differentiation (AD) of discrete connected networks of nested differentiable functions.^[23]^[24] This corresponds to the modern version of backpropagation, but is not yet named as such.^[25]^[26]^[27]^[28]
1976	Discovery	Transfer Learning	Stevo Bozinovski and Ante Fulgosi introduced transfer learning method in neural networks training. ^[29]^[30]
1979		Stanford Cart	Students at Stanford University develop a cart that can navigate and avoid obstacles in a room.^[2]
1979	Discovery	Neocognitron	Kunihiko Fukushima first publishes his work on the neocognitron, a type of artificial neural network (ANN).^[31]^[32] Neocognition later inspires convolutional neural networks (CNNs).^[33]
1981	Achievement	Learning to recognize 40 patterns	Stevo Bozinovski showed an experiment of neural network supervised learning for recognition of 40 linearly dependent patters: 26 letters, 10 numbers, and 4 special symbols from a computer terminal. ^[34]
1981		Explanation Based Learning	Gerald Dejong introduces Explanation Based Learning, where a computer algorithm analyses data and creates a general rule it can follow and discard unimportant data.^[2]
1982	Discovery	Recurrent Neural Network	John Hopfield popularizes Hopfield networks, a type of recurrent neural network that can serve as content-addressable memory systems.^[35]
1982	Discovery	Self Learning	Stevo Bozinovski develops a self-learning paradigm in which an agent does not use an external reinforcement. Instead, the agent learns using internal state evaluations, represented by emotions. He introduces the Crossbar Adaptive Array (CAA) architecture capable of self-learning. ^[36]^[37]
1982	Achievement	Delayed reinforcement learning	Stevo Bozinovski solved the challenge of reinforcement learning with delayed rewards. Using the Crossbar Adaptive Array (CAA) he presented solutions of two tasks: 1) learning path in a graph 2) balancing an inverted pendulum.^[38]
1985		NETtalk	A program that learns to pronounce words the same way a baby does, is developed by Terry Sejnowski.^[2]
1986	Application	Backpropagation	Seppo Linnainmaa's reverse mode of automatic differentiation (first applied to neural networks by Paul Werbos) is used in experiments by David Rumelhart, Geoff Hinton and Ronald J. Williams to learn internal representations.^[39]
1988	Discovery	Universal approximation theorem	Kurt Hornik [ de ] proves that standard multilayer feedforward networks are capable of approximating any Borel measurable function from one finite dimensional space to another to any desired degree of accuracy, provided sufficiently many hidden units are available.
1989	Discovery	Reinforcement Learning	Christopher Watkins develops Q-learning, which greatly improves the practicality and feasibility of reinforcement learning.^[40]
1989	Commercialization	Commercialization of Machine Learning on Personal Computers	Axcelis, Inc. releases Evolver, the first software package to commercialize the use of genetic algorithms on personal computers.^[41]
1992	Achievement	Machines Playing Backgammon	Gerald Tesauro develops TD-Gammon, a computer backgammon program that uses an artificial neural network trained using temporal-difference learning (hence the 'TD' in the name). TD-Gammon is able to rival, but not consistently surpass, the abilities of top human backgammon players.^[42]
1995	Discovery	Random Forest Algorithm	Tin Kam Ho publishes a paper describing random decision forests.^[43]
1995	Discovery	Support-Vector Machines	Corinna Cortes and Vladimir Vapnik publish their work on support-vector machines.^[44]
1997	Achievement	IBM Deep Blue Beats Kasparov	IBM's Deep Blue beats the world champion at chess.^[2]
1997	Discovery	LSTM	Sepp Hochreiter and Jürgen Schmidhuber invent long short-term memory (LSTM) recurrent neural networks,^[45] greatly improving the efficiency and practicality of recurrent neural networks.
1998		MNIST database	A team led by Yann LeCun releases the MNIST database, a dataset comprising a mix of handwritten digits from American Census Bureau employees and American high school students.^[46] The MNIST database has since become a benchmark for evaluating handwriting recognition.
2002	Project	Torch Machine Learning Library	Torch, a software library for machine learning, is first released.^[47]
2006		The Netflix Prize	The Netflix Prize competition is launched by Netflix. The aim of the competition was to use machine learning to beat Netflix's own recommendation software's accuracy in predicting a user's rating for a film given their ratings for previous films by at least 10%.^[48] The prize was won in 2009.
2009	Achievement	ImageNet	ImageNet is created. ImageNet is a large visual database envisioned by Fei-Fei Li from Stanford University, who realized that the best machine learning algorithms wouldn't work well if the data didn't reflect the real world.^[49] For many, ImageNet was the catalyst for the AI boom^[50] of the 21st century.
2010	Project	Kaggle Competition	Kaggle, a website that serves as a platform for machine learning competitions, is launched.^[51]
2011	Achievement	Beating Humans in Jeopardy	Using a combination of machine learning, natural language processing and information retrieval techniques, IBM's Watson beats two human champions in a Jeopardy! competition.^[52]
2012	Achievement	Recognizing Cats on YouTube	The Google Brain team, led by Andrew Ng and Jeff Dean, create a neural network that learns to recognize cats by watching unlabeled images taken from frames of YouTube videos.^[53]^[54]
2012	Discovery	Visual Recognition	The AlexNet paper and algorithm achieves breakthrough results in image recognition in the ImageNet benchmark. This popularizes deep neural networks.^[55]
2013	Discovery	Word Embeddings	A widely cited paper nicknamed word2vec revolutionizes the processing of text in machine learnings. It shows how each word can be converted into a sequence of numbers (word embeddings), the use of these vectors revolutionized text processing in machine learning.
2014	Achievement	Leap in Face Recognition	Facebook researchers publish their work on DeepFace, a system that uses neural networks that identifies faces with 97.35% accuracy. The results are an improvement of more than 27% over previous systems and rivals human performance.^[56]
2014		Sibyl	Researchers from Google detail their work on Sibyl,^[57] a proprietary platform for massively parallel machine learning used internally by Google to make predictions about user behavior and provide recommendations.^[58]
2016	Achievement	Beating Humans in Go	Google's AlphaGo program becomes the first Computer Go program to beat an unhandicapped professional human player^[59] using a combination of machine learning and tree search techniques.^[60] Later improved as AlphaGo Zero and then in 2017 generalized to Chess and more two-player games with AlphaZero.
2017	Discovery	Transformer	A team at Google Brain invent the transformer architecture,^[61] which allows for faster parallel training of neural networks on sequential data like text.
2018	Achievement	Protein Structure Prediction	AlphaFold 1 (2018) placed first in the overall rankings of the 13th Critical Assessment of Techniques for Protein Structure Prediction (CASP) in December 2018.^[62]
2021	Achievement	Protein Structure Prediction	AlphaFold 2 (2021), A team that used AlphaFold 2 (2020) repeated the placement in the CASP competition in November 2020. The team achieved a level of accuracy much higher than any other group. It scored above 90 for around two-thirds of the proteins in CASP's global distance test (GDT), a test that measures the degree to which a computational program predicted structure is similar to the lab experiment determined structure, with 100 being a complete match, within the distance cutoff used for calculating GDT.^[63]

References

Citations

↑ Solomonoff, R.J. (June 1964). "A formal theory of inductive inference. Part II". Information and Control. 7 (2): 224–254. doi:10.1016/S0019-9958(64)90131-7.
1 2 3 4 5 6 Marr 2016.
↑ Siegelmann, H.T.; Sontag, E.D. (February 1995). "On the Computational Power of Neural Nets". Journal of Computer and System Sciences. 50 (1): 132–150. doi: 10.1006/jcss.1995.1013 .
↑ Siegelmann, Hava (1995). "Computation Beyond the Turing Limit". Journal of Computer and System Sciences. 238 (28): 632–637. Bibcode:1995Sci...268..545S. doi:10.1126/science.268.5210.545. PMID 17756722. S2CID 17495161.
↑ Ben-Hur, Asa; Horn, David; Siegelmann, Hava; Vapnik, Vladimir (2001). "Support vector clustering". Journal of Machine Learning Research. 2: 51–86.
↑ Hofmann, Thomas; Schölkopf, Bernhard; Smola, Alexander J. (2008). "Kernel methods in machine learning". The Annals of Statistics. 36 (3): 1171–1220. arXiv: math/0701907 . doi: 10.1214/009053607000000677 . JSTOR 25464664.
↑ Bennett, James; Lanning, Stan (2007). "The netflix prize" (PDF). Proceedings of KDD Cup and Workshop 2007.
↑ Bayes, Thomas (1 January 1763). "An Essay Towards Solving a Problem in the Doctrine of Chance". Philosophical Transactions. 53: 370–418. doi: 10.1098/rstl.1763.0053 . JSTOR 105741.
↑ Legendre, Adrien-Marie (1805). Nouvelles méthodes pour la détermination des orbites des comètes (in French). Paris: Firmin Didot. p. viii. Retrieved 13 June 2016.
↑ O'Connor, J J; Robertson, E F. "Pierre-Simon Laplace". School of Mathematics and Statistics, University of St Andrews, Scotland. Retrieved 15 June 2016.
↑ "Ada Lovelace". AI VIPs. 11 September 2024.
↑ Zwolak, Justyna (22 March 2023). "Ada Lovelace: The World's First Computer Programmer Who Predicted Artificial Intelligence". NIST. National Institute of Standards and Technology.
↑ Gregersen, Erik. "Ada Lovelace: The First Computer Programmer". Encyclopaedia Britannica.
↑ Langston, Nancy (2013). "Mining the Boreal North". American Scientist. 101 (2): 1. doi:10.1511/2013.101.1. Delving into the text of Alexander Pushkin's novel in verse Eugene Onegin, Markov spent hours sifting through patterns of vowels and consonants. On January 23, 1913, he summarized his findings in an address to the Imperial Academy of Sciences in St. Petersburg. His analysis did not alter the understanding or appreciation of Pushkin's poem, but the technique he developed—now known as a Markov chain—extended the theory of probability in a new direction.
↑ McCulloch, Warren S.; Pitts, Walter (December 1943). "A logical calculus of the ideas immanent in nervous activity". The Bulletin of Mathematical Biophysics. 5 (4): 115–133. doi:10.1007/BF02478259.
↑ Turing, A. M. (1 October 1950). "I.—COMPUTING MACHINERY AND INTELLIGENCE". Mind. LIX (236): 433–460. doi:10.1093/mind/LIX.236.433.
↑ Crevier 1993 , pp. 34–35 and Russell & Norvig 2003 , p. 17.
↑ McCarthy, J.; Feigenbaum, E. (1 September 1990). "In memoriam—Arthur Samuel (1901–1990)". AI Magazine. 11 (3): 10–11.
↑ Rosenblatt, F. (1958). "The perceptron: A probabilistic model for information storage and organization in the brain". Psychological Review. 65 (6): 386–408. CiteSeerX 10.1.1.588.3775 . doi:10.1037/h0042519. PMID 13602029. S2CID 12781225.
↑ Mason, Harding; Stewart, D; Gill, Brendan (6 December 1958). "Rival". The New Yorker. Retrieved 5 June 2016.
↑ Child, Oliver (13 March 2016). "Menace: the Machine Educable Noughts And Crosses Engine Read". Chalkdust Magazine. Retrieved 16 Jan 2018.
↑ Cohen, Harvey. "The Perceptron" . Retrieved 5 June 2016.
↑ Linnainmaa, Seppo (1970). Algoritmin kumulatiivinen pyoristysvirhe yksittaisten pyoristysvirheiden taylor-kehitelmana [The representation of the cumulative rounding error of an algorithm as a Taylor expansion of the local rounding errors](PDF) (Thesis) (in Finnish). pp. 6–7.
↑ Linnainmaa, Seppo (1976). "Taylor expansion of the accumulated rounding error". BIT Numerical Mathematics. 16 (2): 146–160. doi:10.1007/BF01931367. S2CID 122357351.
↑ Griewank, Andreas (2012). "Who Invented the Reverse Mode of Differentiation?". Documenta Mathematica, Extra Volume ISMP. Documenta Mathematica Series. 6: 389–400. doi: 10.4171/dms/6/38 . ISBN 978-3-936609-58-5.
↑ Griewank, Andreas; Walther, A. (2008). Principles and Techniques of Algorithmic Differentiation (Second ed.). SIAM. ISBN 978-0898716597.
↑ Schmidhuber, Jürgen (2015). "Deep learning in neural networks: An overview". Neural Networks. 61: 85–117. arXiv: 1404.7828 . Bibcode:2014arXiv1404.7828S. doi:10.1016/j.neunet.2014.09.003. PMID 25462637. S2CID 11715509.
↑ Schmidhuber, Jürgen (2015). "Deep Learning (Section on Backpropagation)". Scholarpedia. 10 (11): 32832. Bibcode:2015SchpJ..1032832S. doi: 10.4249/scholarpedia.32832 .
↑ Stevo Bozinovski and Ante Fulgosi (1976) "The influence of pattern similarity and transfer learning upon training of a base perceptron" (original in Croatian) Proceedings of Symposium Informatica 3-121-5, Bled.
↑ Stevo Bozinovski (2020) "Reminder of the first paper on transfer learning in neural networks, 1976". Informatica 44: 291–302.
↑ Fukushima, Kunihiko (October 1979). "位置ずれに影響されないパターン認識機構の神経回路のモデル --- ネオコグニトロン ---" [Neural network model for a mechanism of pattern recognition unaffected by shift in position — Neocognitron —]. Trans. IECE (in Japanese). J62-A (10): 658–665.
↑ Fukushima, Kunihiko (April 1980). "Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position". Biological Cybernetics. 36 (4): 193–202. doi:10.1007/BF00344251. PMID 7370364. S2CID 206775608.
↑ Le Cun, Yann. "Deep Learning". CiteSeerX 10.1.1.297.6176 .{{cite journal}}: Cite journal requires |journal= (help)
↑ S. Bozinovski (1981) "Teaching space: A representation concept for adaptive pattern classification" COINS Technical Report No. 81-28, Computer and Information Science Department, University of Massachusetts at Amherst, MA, 1981. UM-CS-1981-028.pdf
↑ Hopfield, J J (April 1982). "Neural networks and physical systems with emergent collective computational abilities". Proceedings of the National Academy of Sciences. 79 (8): 2554–2558. Bibcode:1982PNAS...79.2554H. doi: 10.1073/pnas.79.8.2554 . PMC 346238 . PMID 6953413.
↑ Bozinovski, S. (1982). "A self-learning system using secondary reinforcement". In Trappl, Robert (ed.). Cybernetics and Systems Research: Proceedings of the Sixth European Meeting on Cybernetics and Systems Research. North-Holland. pp. 397–402. ISBN 978-0-444-86488-8
↑ Bozinovski S. (1995) "Adaptive parallel distributed processing, neural and genetic agents. Part I: Neuro-genetic agents and structural theory of self-reinforcement learning systems". CMPSCI Technical Report 95-107, University of Massachusetts at Amherst, UM-CS-1995-107
↑ Bozinovski, S. (1999) "Crossbar Adaptive Array: The first connectionist network that solved the delayed reinforcement learning problem" In A. Dobnikar, N. Steele, D. Pearson, R. Albert (Eds.) Artificial Neural Networks and Genetic Algorithms, Springer Verlag, p. 320-325, 1999, ISBN 3-211-83364-1
↑ Rumelhart, David E.; Hinton, Geoffrey E.; Williams, Ronald J. (October 1986). "Learning representations by back-propagating errors". Nature. 323 (6088): 533–536. Bibcode:1986Natur.323..533R. doi:10.1038/323533a0. S2CID 205001834.
↑ Watksin, Christopher (1 May 1989). "Learning from Delayed Rewards" (PDF).{{cite journal}}: Cite journal requires |journal= (help)
↑ Markoff, John (29 August 1990). "BUSINESS TECHNOLOGY; What's the Best Answer? It's Survival of the Fittest". New York Times. Retrieved 8 June 2016.
↑ Tesauro, Gerald (March 1995). "Temporal difference learning and TD-Gammon". Communications of the ACM. 38 (3): 58–68. doi:10.1145/203330.203343. S2CID 8763243.
↑ Tin Kam Ho (1995). "Random decision forests". Proceedings of 3rd International Conference on Document Analysis and Recognition. Vol. 1. pp. 278–282. doi:10.1109/ICDAR.1995.598994. ISBN 0-8186-7128-9.
↑ Cortes, Corinna; Vapnik, Vladimir (September 1995). "Support-vector networks". Machine Learning. 20 (3): 273–297. doi: 10.1007/BF00994018 .
↑ Hochreiter, Sepp; Schmidhuber, Jürgen (1 November 1997). "Long Short-Term Memory". Neural Computation. 9 (8): 1735–1780. doi:10.1162/neco.1997.9.8.1735. PMID 9377276. S2CID 1915014.
↑ LeCun, Yann; Cortes, Corinna; Burges, Christopher. "THE MNIST DATABASE of handwritten digits" . Retrieved 16 June 2016.
↑ Collobert, Ronan; Benigo, Samy; Mariethoz, Johnny (30 October 2002). "Torch: a modular machine learning software library" (PDF). Archived from the original (PDF) on 6 August 2016. Retrieved 5 June 2016.{{cite journal}}: Cite journal requires |journal= (help)
↑ "The Netflix Prize Rules". Netflix Prize. Netflix. Archived from the original on 3 March 2012. Retrieved 16 June 2016.
↑ Gershgorn, Dave (26 July 2017). "ImageNet: the data that spawned the current AI boom — Quartz". qz.com. Retrieved 2018-03-30.
↑ Hardy, Quentin (18 July 2016). "Reasons to Believe the A.I. Boom Is Real". The New York Times.
↑ "About". Kaggle. Kaggle Inc. Archived from the original on 18 March 2016. Retrieved 16 June 2016.
↑ Markoff, John (16 February 2011). "Computer Wins on 'Jeopardy!': Trivial, It's Not". The New York Times. p. A1.
↑ Le, Quoc V. (2013). "Building high-level features using large scale unsupervised learning". 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. pp. 8595–8598. doi:10.1109/ICASSP.2013.6639343. ISBN 978-1-4799-0356-6. S2CID 206741597.
↑ Markoff, John (26 June 2012). "How Many Computers to Identify a Cat? 16,000". New York Times. p. B1. Retrieved 5 June 2016.
↑ "The data that transformed AI research—and possibly the world". Quartz. 2017-07-26. Retrieved 2023-09-12.
↑ Taigman, Yaniv; Yang, Ming; Ranzato, Marc'Aurelio; Wolf, Lior (24 June 2014). "DeepFace: Closing the Gap to Human-Level Performance in Face Verification". Conference on Computer Vision and Pattern Recognition. Retrieved 8 June 2016.
↑ Canini, Kevin; Chandra, Tushar; Ie, Eugene; McFadden, Jim; Goldman, Ken; Gunter, Mike; Harmsen, Jeremiah; LeFevre, Kristen; Lepikhin, Dmitry; Llinares, Tomas Lloret; Mukherjee, Indraneel; Pereira, Fernando; Redstone, Josh; Shaked, Tal; Singer, Yoram. "Sibyl: A system for large scale supervised machine learning" (PDF). Jack Baskin School of Engineering. UC Santa Cruz. Archived from the original (PDF) on 15 August 2017. Retrieved 8 June 2016.
↑ Woodie, Alex (17 July 2014). "Inside Sibyl, Google's Massively Parallel Machine Learning Platform". Datanami. Tabor Communications. Retrieved 8 June 2016.
↑ "Google achieves AI 'breakthrough' by beating Go champion". BBC News. BBC. 27 January 2016. Retrieved 5 June 2016.
↑ "AlphaGo". Google DeepMind. Google Inc. Archived from the original on 30 January 2016. Retrieved 5 June 2016.
↑ Vaswani, Ashish; Shazeer, Noam; Parmar, Niki; Uszkoreit, Jakob; Jones, Llion; Gomez, Aidan N.; Kaiser, Lukasz; Polosukhin, Illia (2017). "Attention Is All You Need". arXiv: 1706.03762 .{{cite journal}}: Cite journal requires |journal= (help)
↑ Sample, Ian (2 December 2018). "Google's DeepMind predicts 3D shapes of proteins". The Guardian.
↑ Eisenstein, Michael (23 November 2021). "Artificial intelligence powers protein-folding predictions". Nature. 599 (7886): 706–708. doi:10.1038/d41586-021-03499-y. S2CID 244528561.

Works cited

Crevier, Daniel (1993). AI: The Tumultuous Search for Artificial Intelligence. New York: BasicBooks. ISBN 0-465-02997-3.
Marr, Bernard (19 February 2016). "A Short History of Machine Learning -- Every Manager Should Read". Forbes. Archived from the original on 2022-12-05. Retrieved 2022-12-25.
Russell, Stuart; Norvig, Peter (2003). Artificial Intelligence: A Modern Approach. London: Pearson Education. ISBN 0-137-90395-2.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Solomonoff, R.J. (June 1964). "A formal theory of inductive inference. Part II". Information and Control. 7 (2): 224–254. doi:10.1016/S0019-9958(64)90131-7.

[Marr-2] 1 2 3 4 5 6 Marr 2016.

[3] Siegelmann, H.T.; Sontag, E.D. (February 1995). "On the Computational Power of Neural Nets". Journal of Computer and System Sciences. 50 (1): 132–150. doi: 10.1006/jcss.1995.1013 .

[4] Siegelmann, Hava (1995). "Computation Beyond the Turing Limit". Journal of Computer and System Sciences. 238 (28): 632–637. Bibcode:1995Sci...268..545S. doi:10.1126/science.268.5210.545. PMID 17756722. S2CID 17495161.

[5] Ben-Hur, Asa; Horn, David; Siegelmann, Hava; Vapnik, Vladimir (2001). "Support vector clustering". Journal of Machine Learning Research. 2: 51–86.

[6] Hofmann, Thomas; Schölkopf, Bernhard; Smola, Alexander J. (2008). "Kernel methods in machine learning". The Annals of Statistics. 36 (3): 1171–1220. arXiv: math/0701907 . doi: 10.1214/009053607000000677 . JSTOR 25464664.

[7] Bennett, James; Lanning, Stan (2007). "The netflix prize" (PDF). Proceedings of KDD Cup and Workshop 2007.

[8] Bayes, Thomas (1 January 1763). "An Essay Towards Solving a Problem in the Doctrine of Chance". Philosophical Transactions. 53: 370–418. doi: 10.1098/rstl.1763.0053 . JSTOR 105741.

[9] Legendre, Adrien-Marie (1805). Nouvelles méthodes pour la détermination des orbites des comètes (in French). Paris: Firmin Didot. p. viii. Retrieved 13 June 2016.

[10] O'Connor, J J; Robertson, E F. "Pierre-Simon Laplace". School of Mathematics and Statistics, University of St Andrews, Scotland. Retrieved 15 June 2016.

[11] "Ada Lovelace". AI VIPs. 11 September 2024.

[12] Zwolak, Justyna (22 March 2023). "Ada Lovelace: The World's First Computer Programmer Who Predicted Artificial Intelligence". NIST. National Institute of Standards and Technology.

[13] Gregersen, Erik. "Ada Lovelace: The First Computer Programmer". Encyclopaedia Britannica.

[14] Langston, Nancy (2013). "Mining the Boreal North". American Scientist. 101 (2): 1. doi:10.1511/2013.101.1. Delving into the text of Alexander Pushkin's novel in verse Eugene Onegin, Markov spent hours sifting through patterns of vowels and consonants. On January 23, 1913, he summarized his findings in an address to the Imperial Academy of Sciences in St. Petersburg. His analysis did not alter the understanding or appreciation of Pushkin's poem, but the technique he developed—now known as a Markov chain—extended the theory of probability in a new direction.

[15] McCulloch, Warren S.; Pitts, Walter (December 1943). "A logical calculus of the ideas immanent in nervous activity". The Bulletin of Mathematical Biophysics. 5 (4): 115–133. doi:10.1007/BF02478259.

[16] Turing, A. M. (1 October 1950). "I.—COMPUTING MACHINERY AND INTELLIGENCE". Mind. LIX (236): 433–460. doi:10.1093/mind/LIX.236.433.

[17] Crevier 1993 , pp. 34–35 and Russell & Norvig 2003 , p. 17.

[aaai-18] McCarthy, J.; Feigenbaum, E. (1 September 1990). "In memoriam—Arthur Samuel (1901–1990)". AI Magazine. 11 (3): 10–11.

[19] Rosenblatt, F. (1958). "The perceptron: A probabilistic model for information storage and organization in the brain". Psychological Review. 65 (6): 386–408. CiteSeerX 10.1.1.588.3775 . doi:10.1037/h0042519. PMID 13602029. S2CID 12781225.

[20] Mason, Harding; Stewart, D; Gill, Brendan (6 December 1958). "Rival". The New Yorker. Retrieved 5 June 2016.

[21] Child, Oliver (13 March 2016). "Menace: the Machine Educable Noughts And Crosses Engine Read". Chalkdust Magazine. Retrieved 16 Jan 2018.

[22] Cohen, Harvey. "The Perceptron" . Retrieved 5 June 2016.

[lin1970-23] Linnainmaa, Seppo (1970). Algoritmin kumulatiivinen pyoristysvirhe yksittaisten pyoristysvirheiden taylor-kehitelmana [The representation of the cumulative rounding error of an algorithm as a Taylor expansion of the local rounding errors](PDF) (Thesis) (in Finnish). pp. 6–7.

[lin1976-24] Linnainmaa, Seppo (1976). "Taylor expansion of the accumulated rounding error". BIT Numerical Mathematics. 16 (2): 146–160. doi:10.1007/BF01931367. S2CID 122357351.

[grie2012-25] Griewank, Andreas (2012). "Who Invented the Reverse Mode of Differentiation?". Documenta Mathematica, Extra Volume ISMP. Documenta Mathematica Series. 6: 389–400. doi: 10.4171/dms/6/38 . ISBN 978-3-936609-58-5.

[grie2008-26] Griewank, Andreas; Walther, A. (2008). Principles and Techniques of Algorithmic Differentiation (Second ed.). SIAM. ISBN 978-0898716597.

[schmidhuber2015-27] Schmidhuber, Jürgen (2015). "Deep learning in neural networks: An overview". Neural Networks. 61: 85–117. arXiv: 1404.7828 . Bibcode:2014arXiv1404.7828S. doi:10.1016/j.neunet.2014.09.003. PMID 25462637. S2CID 11715509.

[scholarpedia2015-28] Schmidhuber, Jürgen (2015). "Deep Learning (Section on Backpropagation)". Scholarpedia. 10 (11): 32832. Bibcode:2015SchpJ..1032832S. doi: 10.4249/scholarpedia.32832 .

[29] Stevo Bozinovski and Ante Fulgosi (1976) "The influence of pattern similarity and transfer learning upon training of a base perceptron" (original in Croatian) Proceedings of Symposium Informatica 3-121-5, Bled.

[30] Stevo Bozinovski (2020) "Reminder of the first paper on transfer learning in neural networks, 1976". Informatica 44: 291–302.

[31] Fukushima, Kunihiko (October 1979). "位置ずれに影響されないパターン認識機構の神経回路のモデル --- ネオコグニトロン ---" [Neural network model for a mechanism of pattern recognition unaffected by shift in position — Neocognitron —]. Trans. IECE (in Japanese). J62-A (10): 658–665.

[32] Fukushima, Kunihiko (April 1980). "Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position". Biological Cybernetics. 36 (4): 193–202. doi:10.1007/BF00344251. PMID 7370364. S2CID 206775608.

[33] Le Cun, Yann. "Deep Learning". CiteSeerX 10.1.1.297.6176 .{{cite journal}}: Cite journal requires |journal= (help)

[34] S. Bozinovski (1981) "Teaching space: A representation concept for adaptive pattern classification" COINS Technical Report No. 81-28, Computer and Information Science Department, University of Massachusetts at Amherst, MA, 1981. UM-CS-1981-028.pdf

[35] Hopfield, J J (April 1982). "Neural networks and physical systems with emergent collective computational abilities". Proceedings of the National Academy of Sciences. 79 (8): 2554–2558. Bibcode:1982PNAS...79.2554H. doi: 10.1073/pnas.79.8.2554 . PMC 346238 . PMID 6953413.

[CAA82-36] Bozinovski, S. (1982). "A self-learning system using secondary reinforcement". In Trappl, Robert (ed.). Cybernetics and Systems Research: Proceedings of the Sixth European Meeting on Cybernetics and Systems Research. North-Holland. pp. 397–402. ISBN 978-0-444-86488-8

[37] Bozinovski S. (1995) "Adaptive parallel distributed processing, neural and genetic agents. Part I: Neuro-genetic agents and structural theory of self-reinforcement learning systems". CMPSCI Technical Report 95-107, University of Massachusetts at Amherst, UM-CS-1995-107

[38] Bozinovski, S. (1999) "Crossbar Adaptive Array: The first connectionist network that solved the delayed reinforcement learning problem" In A. Dobnikar, N. Steele, D. Pearson, R. Albert (Eds.) Artificial Neural Networks and Genetic Algorithms, Springer Verlag, p. 320-325, 1999, ISBN 3-211-83364-1

[39] Rumelhart, David E.; Hinton, Geoffrey E.; Williams, Ronald J. (October 1986). "Learning representations by back-propagating errors". Nature. 323 (6088): 533–536. Bibcode:1986Natur.323..533R. doi:10.1038/323533a0. S2CID 205001834.

[40] Watksin, Christopher (1 May 1989). "Learning from Delayed Rewards" (PDF).{{cite journal}}: Cite journal requires |journal= (help)

[41] Markoff, John (29 August 1990). "BUSINESS TECHNOLOGY; What's the Best Answer? It's Survival of the Fittest". New York Times. Retrieved 8 June 2016.

[42] Tesauro, Gerald (March 1995). "Temporal difference learning and TD-Gammon". Communications of the ACM. 38 (3): 58–68. doi:10.1145/203330.203343. S2CID 8763243.

[43] Tin Kam Ho (1995). "Random decision forests". Proceedings of 3rd International Conference on Document Analysis and Recognition. Vol. 1. pp. 278–282. doi:10.1109/ICDAR.1995.598994. ISBN 0-8186-7128-9.

[44] Cortes, Corinna; Vapnik, Vladimir (September 1995). "Support-vector networks". Machine Learning. 20 (3): 273–297. doi: 10.1007/BF00994018 .

[45] Hochreiter, Sepp; Schmidhuber, Jürgen (1 November 1997). "Long Short-Term Memory". Neural Computation. 9 (8): 1735–1780. doi:10.1162/neco.1997.9.8.1735. PMID 9377276. S2CID 1915014.

[46] LeCun, Yann; Cortes, Corinna; Burges, Christopher. "THE MNIST DATABASE of handwritten digits" . Retrieved 16 June 2016.

[47] Collobert, Ronan; Benigo, Samy; Mariethoz, Johnny (30 October 2002). "Torch: a modular machine learning software library" (PDF). Archived from the original (PDF) on 6 August 2016. Retrieved 5 June 2016.{{cite journal}}: Cite journal requires |journal= (help)

[48] "The Netflix Prize Rules". Netflix Prize. Netflix. Archived from the original on 3 March 2012. Retrieved 16 June 2016.

[49] Gershgorn, Dave (26 July 2017). "ImageNet: the data that spawned the current AI boom — Quartz". qz.com. Retrieved 2018-03-30.

[50] Hardy, Quentin (18 July 2016). "Reasons to Believe the A.I. Boom Is Real". The New York Times.

[51] "About". Kaggle. Kaggle Inc. Archived from the original on 18 March 2016. Retrieved 16 June 2016.

[52] Markoff, John (16 February 2011). "Computer Wins on 'Jeopardy!': Trivial, It's Not". The New York Times. p. A1.

[53] Le, Quoc V. (2013). "Building high-level features using large scale unsupervised learning". 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. pp. 8595–8598. doi:10.1109/ICASSP.2013.6639343. ISBN 978-1-4799-0356-6. S2CID 206741597.

[54] Markoff, John (26 June 2012). "How Many Computers to Identify a Cat? 16,000". New York Times. p. B1. Retrieved 5 June 2016.

[55] "The data that transformed AI research—and possibly the world". Quartz. 2017-07-26. Retrieved 2023-09-12.

[56] Taigman, Yaniv; Yang, Ming; Ranzato, Marc'Aurelio; Wolf, Lior (24 June 2014). "DeepFace: Closing the Gap to Human-Level Performance in Face Verification". Conference on Computer Vision and Pattern Recognition. Retrieved 8 June 2016.

[57] Canini, Kevin; Chandra, Tushar; Ie, Eugene; McFadden, Jim; Goldman, Ken; Gunter, Mike; Harmsen, Jeremiah; LeFevre, Kristen; Lepikhin, Dmitry; Llinares, Tomas Lloret; Mukherjee, Indraneel; Pereira, Fernando; Redstone, Josh; Shaked, Tal; Singer, Yoram. "Sibyl: A system for large scale supervised machine learning" (PDF). Jack Baskin School of Engineering. UC Santa Cruz. Archived from the original (PDF) on 15 August 2017. Retrieved 8 June 2016.

[58] Woodie, Alex (17 July 2014). "Inside Sibyl, Google's Massively Parallel Machine Learning Platform". Datanami. Tabor Communications. Retrieved 8 June 2016.

[59] "Google achieves AI 'breakthrough' by beating Go champion". BBC News. BBC. 27 January 2016. Retrieved 5 June 2016.

[60] "AlphaGo". Google DeepMind. Google Inc. Archived from the original on 30 January 2016. Retrieved 5 June 2016.

[61] Vaswani, Ashish; Shazeer, Noam; Parmar, Niki; Uszkoreit, Jakob; Jones, Llion; Gomez, Aidan N.; Kaiser, Lukasz; Polosukhin, Illia (2017). "Attention Is All You Need". arXiv: 1706.03762 .{{cite journal}}: Cite journal requires |journal= (help)

[62] Sample, Ian (2 December 2018). "Google's DeepMind predicts 3D shapes of proteins". The Guardian.

[63] Eisenstein, Michael (23 November 2021). "Artificial intelligence powers protein-folding predictions". Nature. 599 (7886): 706–708. doi:10.1038/d41586-021-03499-y. S2CID 244528561.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

[28]

[29]

[30]

[31]

[32]

[33]

[34]

[35]

[36]

[37]

[38]

[39]

[40]

[41]

[42]

[43]

[44]

[45]

[46]

[47]

[48]

[49]

[50]

[51]

[52]

[53]

[54]

[55]

[56]

[57]

[58]

[59]

[60]

[61]

[62]

[63]

v t e Timelines of computing
Computing	Before 1950 1950–1979 1980s 1990s 2000s 2010s 2020s Scientific Women in computing
Computer science	Algorithms Artificial intelligence Binary prefixes Cryptography Machine learning Quantum computing and communication
Software	Free and open-source software Hypertext technology Operating systems DOS family Windows Linux Programming languages Virtualization development Malware
Internet	Internet conflicts Web browsers Web search engines
Notable people	Kathleen Antonelli John Vincent Atanasoff Charles Babbage John Backus Jean Bartik George Boole Vint Cerf John Cocke Stephen Cook Edsger W. Dijkstra J. Presper Eckert Adele Goldstine Lois Haibt Betty Holberton Margaret Hamilton Grace Hopper David A. Huffman Bob Kahn Brian Kernighan Andrew Koenig Semyon Korsakov Nancy Leveson Ada Lovelace Donald Knuth Joseph Kruskal Douglas McIlroy Marlyn Meltzer John von Neumann Klára Dán von Neumann Dennis Ritchie Guido van Rossum Claude Shannon Frances Spence Bjarne Stroustrup Ruth Teitelbaum Ken Thompson Linus Torvalds Alan Turing Paul Vixie Larry Wall Stephen Wolfram Niklaus Wirth Steve Wozniak Konrad Zuse