Timeline of machine learning

Last updated

This page is a timeline of machine learning. Major discoveries, achievements, milestones and other major events in machine learning are included.

Contents

Overview

DecadeSummary
pre-
1950
Statistical methods are discovered and refined.
1950sPioneering machine learning research is conducted using simple algorithms.
1960s Bayesian methods are introduced for probabilistic inference in machine learning. [1]
1970s'AI winter' caused by pessimism about machine learning effectiveness.
1980sRediscovery of backpropagation causes a resurgence in machine learning research.
1990sWork on Machine learning shifts from a knowledge-driven approach to a data-driven approach. Scientists begin creating programs for computers to analyze large amounts of data and draw conclusions  or "learn"  from the results. [2] Support-vector machines (SVMs) and recurrent neural networks (RNNs) become popular. [3] The fields of computational complexity via neural networks and super-Turing computation started. [4]
2000sSupport-Vector Clustering [5] and other kernel methods [6] and unsupervised machine learning methods become widespread. [7]
2010s Deep learning becomes feasible, which leads to machine learning becoming integral to many widely used software services and applications. Deep learning spurs huge advances in vision and text processing.
2020s Generative AI leads to revolutionary models, creating a proliferation of foundation models both proprietary and open source, notably enabling products such as ChatGPT (text-based) and Stable Diffusion (image based). Machine learning and AI enter the wider public consciousness. The commercial potential of AI based on machine learning causes large increases in valuations of companies linked to AI.

Timeline

YearEvent typeCaptionEvent
1763DiscoveryThe Underpinnings of Bayes' Theorem Thomas Bayes's work An Essay towards solving a Problem in the Doctrine of Chances is published two years after his death, having been amended and edited by a friend of Bayes, Richard Price. [8] The essay presents work which underpins Bayes theorem.
1805DiscoveryLeast Square Adrien-Marie Legendre describes the "méthode des moindres carrés", known in English as the least squares method. [9] The least squares method is used widely in data fitting.
1812 Bayes' Theorem Pierre-Simon Laplace publishes Théorie Analytique des Probabilités, in which he expands upon the work of Bayes and defines what is now known as Bayes' Theorem. [10]
1913DiscoveryMarkov Chains Andrey Markov first describes techniques he used to analyse a poem. The techniques later become known as Markov chains. [11]
1943Discovery Artificial Neuron Warren McCulloch and Walter Pitts develop a mathematical model that imitates the functioning of a biological neuron, the artificial neuron which is considered to be the first neural model invented. [12]
1950Turing's Learning Machine Alan Turing proposes a 'learning machine' that could learn and become artificially intelligent. Turing's specific proposal foreshadows genetic algorithms. [13]
1951First Neural Network Machine Marvin Minsky and Dean Edmonds build the first neural network machine, able to learn, the SNARC. [14]
1952Machines Playing Checkers Arthur Samuel joins IBM's Poughkeepsie Laboratory and begins working on some of the very first machine learning programs, first creating programs that play checkers. [15]
1957DiscoveryPerceptron Frank Rosenblatt invents the perceptron while working at the Cornell Aeronautical Laboratory. [16] The invention of the perceptron generated a great deal of excitement and was widely covered in the media. [17]
1963AchievementMachines Playing Tic-Tac-Toe Donald Michie creates a 'machine' consisting of 304 match boxes and beads, which uses reinforcement learning to play Tic-tac-toe (also known as noughts and crosses). [18]
1967Nearest NeighborThe nearest neighbour algorithm was created, which is the start of basic pattern recognition. The algorithm was used to map routes. [2]
1969Limitations of Neural Networks Marvin Minsky and Seymour Papert publish their book Perceptrons , describing some of the limitations of perceptrons and neural networks. The interpretation that the book shows that neural networks are fundamentally limited is seen as a hindrance for research into neural networks. [19]
1970Automatic Differentiation (Backpropagation) Seppo Linnainmaa publishes the general method for automatic differentiation (AD) of discrete connected networks of nested differentiable functions. [20] [21] This corresponds to the modern version of backpropagation, but is not yet named as such. [22] [23] [24] [25]
1979Stanford CartStudents at Stanford University develop a cart that can navigate and avoid obstacles in a room. [2]
1979DiscoveryNeocognitron Kunihiko Fukushima first publishes his work on the neocognitron, a type of artificial neural network (ANN). [26] [27] Neocognition later inspires convolutional neural networks (CNNs). [28]
1981Explanation Based LearningGerald Dejong introduces Explanation Based Learning, where a computer algorithm analyses data and creates a general rule it can follow and discard unimportant data. [2]
1982DiscoveryRecurrent Neural Network John Hopfield popularizes Hopfield networks, a type of recurrent neural network that can serve as content-addressable memory systems. [29]
1985 NETtalk A program that learns to pronounce words the same way a baby does, is developed by Terry Sejnowski. [2]
1986ApplicationBackpropagation Seppo Linnainmaa's reverse mode of automatic differentiation (first applied to neural networks by Paul Werbos) is used in experiments by David Rumelhart, Geoff Hinton and Ronald J. Williams to learn internal representations. [30]
1988 Universal approximation theorem Kurt Hornik  [ de ] proves that standard multilayer feedforward networks are capable of approximating any Borel measurable function from one finite dimensional space to another to any desired degree of accuracy, provided sufficiently many hidden units are available.
1989DiscoveryReinforcement LearningChristopher Watkins develops Q-learning, which greatly improves the practicality and feasibility of reinforcement learning. [31]
1989CommercializationCommercialization of Machine Learning on Personal ComputersAxcelis, Inc. releases Evolver, the first software package to commercialize the use of genetic algorithms on personal computers. [32]
1992AchievementMachines Playing BackgammonGerald Tesauro develops TD-Gammon, a computer backgammon program that uses an artificial neural network trained using temporal-difference learning (hence the 'TD' in the name). TD-Gammon is able to rival, but not consistently surpass, the abilities of top human backgammon players. [33]
1995DiscoveryRandom Forest AlgorithmTin Kam Ho publishes a paper describing random decision forests. [34]
1995DiscoverySupport-Vector Machines Corinna Cortes and Vladimir Vapnik publish their work on support-vector machines. [35]
1997AchievementIBM Deep Blue Beats KasparovIBM's Deep Blue beats the world champion at chess. [2]
1997DiscoveryLSTM Sepp Hochreiter and Jürgen Schmidhuber invent long short-term memory (LSTM) recurrent neural networks, [36] greatly improving the efficiency and practicality of recurrent neural networks.
1998MNIST databaseA team led by Yann LeCun releases the MNIST database, a dataset comprising a mix of handwritten digits from American Census Bureau employees and American high school students. [37] The MNIST database has since become a benchmark for evaluating handwriting recognition.
2002Torch Machine Learning Library Torch, a software library for machine learning, is first released. [38]
2006The Netflix PrizeThe Netflix Prize competition is launched by Netflix. The aim of the competition was to use machine learning to beat Netflix's own recommendation software's accuracy in predicting a user's rating for a film given their ratings for previous films by at least 10%. [39] The prize was won in 2009.
2009AchievementImageNet ImageNet is created. ImageNet is a large visual database envisioned by Fei-Fei Li from Stanford University, who realized that the best machine learning algorithms wouldn't work well if the data didn't reflect the real world. [40] For many, ImageNet was the catalyst for the AI boom [41] of the 21st century.
2010Kaggle Competition Kaggle, a website that serves as a platform for machine learning competitions, is launched. [42]
2011AchievementBeating Humans in JeopardyUsing a combination of machine learning, natural language processing and information retrieval techniques, IBM's Watson beats two human champions in a Jeopardy! competition. [43]
2012AchievementRecognizing Cats on YouTubeThe Google Brain team, led by Andrew Ng and Jeff Dean, create a neural network that learns to recognize cats by watching unlabeled images taken from frames of YouTube videos. [44] [45]
2012DiscoveryVisual RecognitionThe AlexNet paper and algorithm achieves breakthrough results in image recognition in the ImageNet benchmark. This popularizes deep neural networks. [46]
2013DiscoveryWord EmbeddingsA widely cited paper nicknamed word2vec revolutionizes the processing of text in machine learnings. It shows how each word can be converted into a sequence of numbers (word embeddings), the use of these vectors revolutionized text processing in machine learning. [47]
2014Leap in Face Recognition Facebook researchers publish their work on DeepFace, a system that uses neural networks that identifies faces with 97.35% accuracy. The results are an improvement of more than 27% over previous systems and rivals human performance. [48]
2014SibylResearchers from Google detail their work on Sibyl, [49] a proprietary platform for massively parallel machine learning used internally by Google to make predictions about user behavior and provide recommendations. [50]
2016AchievementBeating Humans in GoGoogle's AlphaGo program becomes the first Computer Go program to beat an unhandicapped professional human player [51] using a combination of machine learning and tree search techniques. [52] Later improved as AlphaGo Zero and then in 2017 generalized to Chess and more two-player games with AlphaZero.
2017DiscoveryTransformerA team at Google Brain invent the transformer architecture, [53] which allows for faster parallel training of neural networks on sequential data like text.
2018AchievementProtein Structure PredictionAlphaFold 1 (2018) placed first in the overall rankings of the 13th Critical Assessment of Techniques for Protein Structure Prediction (CASP) in December 2018. [54]
2021AchievementProtein Structure PredictionAlphaFold 2 (2021), A team that used AlphaFold 2 (2020) repeated the placement in the CASP competition in November 2020. The team achieved a level of accuracy much higher than any other group. It scored above 90 for around two-thirds of the proteins in CASP's global distance test (GDT), a test that measures the degree to which a computational program predicted structure is similar to the lab experiment determined structure, with 100 being a complete match, within the distance cutoff used for calculating GDT. [55]

See also

Related Research Articles

<span class="mw-page-title-main">Artificial intelligence</span> Intelligence of machines or software

Artificial intelligence (AI) is the intelligence of machines or software, as opposed to the intelligence of humans or other animals. It is a field of study in computer science that develops and studies intelligent machines. Such machines may be called AIs.

<span class="mw-page-title-main">Artificial neural network</span> Computational model used in machine learning, based on connected, hierarchical functions

Artificial neural networks are a branch of machine learning models that are built using principles of neuronal organization discovered by connectionism in the biological neural networks constituting animal brains.

<span class="mw-page-title-main">Jürgen Schmidhuber</span> German computer scientist

Jürgen Schmidhuber is a German computer scientist noted for his work in the field of artificial intelligence, specifically artificial neural networks. He is a scientific director of the Dalle Molle Institute for Artificial Intelligence Research in Switzerland. He is also director of the Artificial Intelligence Initiative and professor of the Computer Science program in the Computer, Electrical, and Mathematical Sciences and Engineering (CEMSE) division at the King Abdullah University of Science and Technology (KAUST) in Saudi Arabia.

Neuroevolution, or neuro-evolution, is a form of artificial intelligence that uses evolutionary algorithms to generate artificial neural networks (ANN), parameters, and rules. It is most commonly applied in artificial life, general game playing and evolutionary robotics. The main benefit is that neuroevolution can be applied more widely than supervised learning algorithms, which require a syllabus of correct input-output pairs. In contrast, neuroevolution requires only a measure of a network's performance at a task. For example, the outcome of a game can be easily measured without providing labeled examples of desired strategies. Neuroevolution is commonly used as part of the reinforcement learning paradigm, and it can be contrasted with conventional deep learning techniques that use backpropagation with a fixed topology.

Neuromorphic computing is an approach to computing that is inspired by the structure and function of the human brain. A neuromorphic computer/chip is any device that uses physical artificial neurons to do computations. In recent times, the term neuromorphic has been used to describe analog, digital, mixed-mode analog/digital VLSI, and software systems that implement models of neural systems. The implementation of neuromorphic computing on the hardware level can be realized by oxide-based memristors, spintronic memories, threshold switches, transistors, among others. Training software-based neuromorphic systems of spiking neural networks can be achieved using error backpropagation, e.g., using Python based frameworks such as snnTorch, or using canonical learning rules from the biological learning literature, e.g., using BindsNet.

A cognitive architecture refers to both a theory about the structure of the human mind and to a computational instantiation of such a theory used in the fields of artificial intelligence (AI) and computational cognitive science. The formalized models can be used to further refine a comprehensive theory of cognition and as a useful artificial intelligence program. Successful cognitive architectures include ACT-R and SOAR. The research on cognitive architectures as software instantiation of cognitive theories was initiated by Allen Newell in 1990.

A recurrent neural network (RNN) is one of the two broad types of artificial neural network, characterized by direction of the flow of information between its layers. In contrast to the uni-directional feedforward neural network, it is a bi-directional artificial neural network, meaning that it allows the output from some nodes to affect subsequent input to the same nodes. Their ability to use internal state (memory) to process arbitrary sequences of inputs makes them applicable to tasks such as unsegmented, connected handwriting recognition or speech recognition. The term "recurrent neural network" is used to refer to the class of networks with an infinite impulse response, whereas "convolutional neural network" refers to the class of finite impulse response. Both classes of networks exhibit temporal dynamic behavior. A finite impulse recurrent network is a directed acyclic graph that can be unrolled and replaced with a strictly feedforward neural network, while an infinite impulse recurrent network is a directed cyclic graph that can not be unrolled.

<span class="mw-page-title-main">Long short-term memory</span> Artificial recurrent neural network architecture used in deep learning

Long short-term memory (LSTM) network is a recurrent neural network (RNN), aimed to deal with the vanishing gradient problem present in traditional RNNs. Its relative insensitivity to gap length is its advantage over other RNNs, hidden Markov models and other sequence learning methods. It aims to provide a short-term memory for RNN that can last thousands of timesteps, thus "long short-term memory". It is applicable to classification, processing and predicting data based on time series, such as in handwriting, speech recognition, machine translation, speech activity detection, robot control, video games, and healthcare.

This is a timeline of artificial intelligence, sometimes alternatively called synthetic intelligence.

Kunihiko Fukushima is a Japanese computer scientist, most noted for his work on artificial neural networks and deep learning. He is currently working part-time as a senior research scientist at the Fuzzy Logic Systems Institute in Fukuoka, Japan.

There are many types of artificial neural networks (ANN).

<span class="mw-page-title-main">Sepp Hochreiter</span> German computer scientist

Josef "Sepp" Hochreiter is a German computer scientist. Since 2018 he has led the Institute for Machine Learning at the Johannes Kepler University of Linz after having led the Institute of Bioinformatics from 2006 to 2018. In 2017 he became the head of the Linz Institute of Technology (LIT) AI Lab. Hochreiter is also a founding director of the Institute of Advanced Research in Artificial Intelligence (IARAI). Previously, he was at the Technical University of Berlin, at the University of Colorado at Boulder, and at the Technical University of Munich. He is a chair of the Critical Assessment of Massive Data Analysis (CAMDA) conference.

<span class="mw-page-title-main">Deep learning</span> Branch of machine learning

Deep learning is the subset of machine learning methods based on artificial neural networks (ANNs) with representation learning. The adjective "deep" refers to the use of multiple layers in the network. Methods used can be either supervised, semi-supervised or unsupervised.

An artificial neural network's learning rule or learning process is a method, mathematical logic or algorithm which improves the network's performance and/or training time. Usually, this rule is applied repeatedly over the network. It is done by updating the weights and bias levels of a network when a network is simulated in a specific data environment. A learning rule may accept existing conditions of the network and will compare the expected result and actual result of the network to give new and improved values for weights and bias. Depending on the complexity of actual model being simulated, the learning rule of the network can be as simple as an XOR gate or mean squared error, or as complex as the result of a system of differential equations.

This glossary of artificial intelligence is a list of definitions of terms and concepts relevant to the study of artificial intelligence, its sub-disciplines, and related fields. Related glossaries include Glossary of computer science, Glossary of robotics, and Glossary of machine vision.

An AI accelerator, deep learning processor, or neural processing unit is a class of specialized hardware accelerator or computer system designed to accelerate artificial intelligence and machine learning applications, including artificial neural networks and machine vision. Typical applications include algorithms for robotics, Internet of Things, and other data-intensive or sensor-driven tasks. They are often manycore designs and generally focus on low-precision arithmetic, novel dataflow architectures or in-memory computing capability. As of 2024, a typical AI integrated circuit chip contains tens of billions of MOSFET transistors.

<span class="mw-page-title-main">AlexNet</span> Convolutional neural network

AlexNet is the name of a convolutional neural network (CNN) architecture, designed by Alex Krizhevsky in collaboration with Ilya Sutskever and Geoffrey Hinton, who was Krizhevsky's Ph.D. advisor at the University of Toronto.

Explainable AI (XAI), often overlapping with Interpretable AI, or Explainable Machine Learning (XML), either refers to an AI system over which it is possible for humans to retain intellectual oversight, or to the methods to achieve this. The main focus is usually on the reasoning behind the decisions or predictions made by the AI which are made more understandable and transparent. XAI counters the "black box" tendency of machine learning, where even the AI's designers cannot explain why it arrived at a specific decision.

Artificial neural networks (ANNs) are models created using machine learning to perform a number of tasks. Their creation was inspired by neural circuitry. While some of the computational implementations ANNs relate to earlier discoveries in mathematics, the first implementation of ANNs was by psychologist Frank Rosenblatt, who developed the perceptron. Little research was conducted on ANNs in the 1970s and 1980s, with the AAAI calling that period an "AI winter".

<span class="mw-page-title-main">Juyang Weng</span>

Juyang (John) Weng is a Chinese-American computer engineer, neuroscientist, author, and academic. He is a former professor at the Department of Computer Science and Engineering at Michigan State University and the President of Brain-Mind Institute and GENISAMA.

References

Citations

  1. Solomonoff, R.J. (June 1964). "A formal theory of inductive inference. Part II". Information and Control. 7 (2): 224–254. doi:10.1016/S0019-9958(64)90131-7.
  2. 1 2 3 4 5 6 Marr 2016.
  3. Siegelmann, H.T.; Sontag, E.D. (February 1995). "On the Computational Power of Neural Nets". Journal of Computer and System Sciences. 50 (1): 132–150. doi: 10.1006/jcss.1995.1013 .
  4. Siegelmann, Hava (1995). "Computation Beyond the Turing Limit". Journal of Computer and System Sciences. 238 (28): 632–637. Bibcode:1995Sci...268..545S. doi:10.1126/science.268.5210.545. PMID   17756722. S2CID   17495161.
  5. Ben-Hur, Asa; Horn, David; Siegelmann, Hava; Vapnik, Vladimir (2001). "Support vector clustering". Journal of Machine Learning Research. 2: 51–86.
  6. Hofmann, Thomas; Schölkopf, Bernhard; Smola, Alexander J. (2008). "Kernel methods in machine learning". The Annals of Statistics. 36 (3): 1171–1220. arXiv: math/0701907 . doi: 10.1214/009053607000000677 . JSTOR   25464664.
  7. Bennett, James; Lanning, Stan (2007). "The netflix prize" (PDF). Proceedings of KDD Cup and Workshop 2007.
  8. Bayes, Thomas (1 January 1763). "An Essay towards solving a Problem in the Doctrine of Chance". Philosophical Transactions. 53: 370–418. doi: 10.1098/rstl.1763.0053 . JSTOR   105741.
  9. Legendre, Adrien-Marie (1805). Nouvelles méthodes pour la détermination des orbites des comètes (in French). Paris: Firmin Didot. p. viii. Retrieved 13 June 2016.
  10. O'Connor, J J; Robertson, E F. "Pierre-Simon Laplace". School of Mathematics and Statistics, University of St Andrews, Scotland. Retrieved 15 June 2016.
  11. Langston, Nancy (2013). "Mining the Boreal North". American Scientist. 101 (2): 1. doi:10.1511/2013.101.1. Delving into the text of Alexander Pushkin's novel in verse Eugene Onegin, Markov spent hours sifting through patterns of vowels and consonants. On January 23, 1913, he summarized his findings in an address to the Imperial Academy of Sciences in St. Petersburg. His analysis did not alter the understanding or appreciation of Pushkin's poem, but the technique he developed—now known as a Markov chain—extended the theory of probability in a new direction.
  12. McCulloch, Warren S.; Pitts, Walter (December 1943). "A logical calculus of the ideas immanent in nervous activity". The Bulletin of Mathematical Biophysics. 5 (4): 115–133. doi:10.1007/BF02478259.
  13. Turing, A. M. (1 October 1950). "I.—COMPUTING MACHINERY AND INTELLIGENCE". Mind. LIX (236): 433–460. doi:10.1093/mind/LIX.236.433.
  14. Crevier 1993 , pp. 34–35 and Russell & Norvig 2003 , p. 17.
  15. McCarthy, J.; Feigenbaum, E. (1 September 1990). "In memoriam—Arthur Samuel (1901–1990)". AI Magazine. 11 (3): 10–11.
  16. Rosenblatt, F. (1958). "The perceptron: A probabilistic model for information storage and organization in the brain". Psychological Review. 65 (6): 386–408. CiteSeerX   10.1.1.588.3775 . doi:10.1037/h0042519. PMID   13602029. S2CID   12781225.
  17. Mason, Harding; Stewart, D; Gill, Brendan (6 December 1958). "Rival". The New Yorker. Retrieved 5 June 2016.
  18. Child, Oliver (13 March 2016). "Menace: the Machine Educable Noughts And Crosses Engine Read". Chalkdust Magazine. Retrieved 16 Jan 2018.
  19. Cohen, Harvey. "The Perceptron" . Retrieved 5 June 2016.
  20. Linnainmaa, Seppo (1970). Algoritmin kumulatiivinen pyoristysvirhe yksittaisten pyoristysvirheiden taylor-kehitelmana [The representation of the cumulative rounding error of an algorithm as a Taylor expansion of the local rounding errors](PDF) (Thesis) (in Finnish). pp. 6–7.
  21. Linnainmaa, Seppo (1976). "Taylor expansion of the accumulated rounding error". BIT Numerical Mathematics. 16 (2): 146–160. doi:10.1007/BF01931367. S2CID   122357351.
  22. Griewank, Andreas (2012). "Who Invented the Reverse Mode of Differentiation?". Documenta Matematica, Extra Volume ISMP: 389–400.
  23. Griewank, Andreas; Walther, A. (2008). Principles and Techniques of Algorithmic Differentiation (Second ed.). SIAM. ISBN   978-0898716597.
  24. Schmidhuber, Jürgen (2015). "Deep learning in neural networks: An overview". Neural Networks. 61: 85–117. arXiv: 1404.7828 . Bibcode:2014arXiv1404.7828S. doi:10.1016/j.neunet.2014.09.003. PMID   25462637. S2CID   11715509.
  25. Schmidhuber, Jürgen (2015). "Deep Learning (Section on Backpropagation)". Scholarpedia. 10 (11): 32832. Bibcode:2015SchpJ..1032832S. doi: 10.4249/scholarpedia.32832 .
  26. Fukushima, Kunihiko (October 1979). "位置ずれに影響されないパターン認識機構の神経回路のモデル --- ネオコグニトロン ---" [Neural network model for a mechanism of pattern recognition unaffected by shift in position — Neocognitron —]. Trans. IECE (in Japanese). J62-A (10): 658–665.
  27. Fukushima, Kunihiko (April 1980). "Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position". Biological Cybernetics. 36 (4): 193–202. doi:10.1007/BF00344251. PMID   7370364. S2CID   206775608.
  28. Le Cun, Yann. "Deep Learning". CiteSeerX   10.1.1.297.6176 .{{cite journal}}: Cite journal requires |journal= (help)
  29. Hopfield, J J (April 1982). "Neural networks and physical systems with emergent collective computational abilities". Proceedings of the National Academy of Sciences. 79 (8): 2554–2558. Bibcode:1982PNAS...79.2554H. doi: 10.1073/pnas.79.8.2554 . PMC   346238 . PMID   6953413.
  30. Rumelhart, David E.; Hinton, Geoffrey E.; Williams, Ronald J. (October 1986). "Learning representations by back-propagating errors". Nature. 323 (6088): 533–536. Bibcode:1986Natur.323..533R. doi:10.1038/323533a0. S2CID   205001834.
  31. Watksin, Christopher (1 May 1989). "Learning from Delayed Rewards" (PDF).{{cite journal}}: Cite journal requires |journal= (help)
  32. Markoff, John (29 August 1990). "BUSINESS TECHNOLOGY; What's the Best Answer? It's Survival of the Fittest". New York Times. Retrieved 8 June 2016.
  33. Tesauro, Gerald (March 1995). "Temporal difference learning and TD-Gammon". Communications of the ACM. 38 (3): 58–68. doi:10.1145/203330.203343. S2CID   8763243.
  34. Tin Kam Ho (1995). "Random decision forests". Proceedings of 3rd International Conference on Document Analysis and Recognition. Vol. 1. pp. 278–282. doi:10.1109/ICDAR.1995.598994. ISBN   0-8186-7128-9.
  35. Cortes, Corinna; Vapnik, Vladimir (September 1995). "Support-vector networks". Machine Learning. 20 (3): 273–297. doi: 10.1007/BF00994018 .
  36. Hochreiter, Sepp; Schmidhuber, Jürgen (1 November 1997). "Long Short-Term Memory". Neural Computation. 9 (8): 1735–1780. doi:10.1162/neco.1997.9.8.1735. PMID   9377276. S2CID   1915014.
  37. LeCun, Yann; Cortes, Corinna; Burges, Christopher. "THE MNIST DATABASE of handwritten digits" . Retrieved 16 June 2016.
  38. Collobert, Ronan; Benigo, Samy; Mariethoz, Johnny (30 October 2002). "Torch: a modular machine learning software library" (PDF). Retrieved 5 June 2016.{{cite journal}}: Cite journal requires |journal= (help)
  39. "The Netflix Prize Rules". Netflix Prize. Netflix. Archived from the original on 3 March 2012. Retrieved 16 June 2016.
  40. Gershgorn, Dave (26 July 2017). "ImageNet: the data that spawned the current AI boom — Quartz". qz.com. Retrieved 2018-03-30.
  41. Hardy, Quentin (18 July 2016). "Reasons to Believe the A.I. Boom Is Real". The New York Times.
  42. "About". Kaggle. Kaggle Inc. Retrieved 16 June 2016.
  43. Markoff, John (16 February 2011). "Computer Wins on 'Jeopardy!': Trivial, It's Not". The New York Times. p. A1.
  44. Le, Quoc V. (2013). "Building high-level features using large scale unsupervised learning". 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. pp. 8595–8598. doi:10.1109/ICASSP.2013.6639343. ISBN   978-1-4799-0356-6. S2CID   206741597.
  45. Markoff, John (26 June 2012). "How Many Computers to Identify a Cat? 16,000". New York Times. p. B1. Retrieved 5 June 2016.
  46. "The data that transformed AI research—and possibly the world". Quartz. 2017-07-26. Retrieved 2023-09-12.
  47. PhD, Pedram Ataee (2022-07-03). "Word2Vec Models are Simple Yet Revolutionary". Medium. Retrieved 2023-09-12.
  48. Taigman, Yaniv; Yang, Ming; Ranzato, Marc'Aurelio; Wolf, Lior (24 June 2014). "DeepFace: Closing the Gap to Human-Level Performance in Face Verification". Conference on Computer Vision and Pattern Recognition. Retrieved 8 June 2016.
  49. Canini, Kevin; Chandra, Tushar; Ie, Eugene; McFadden, Jim; Goldman, Ken; Gunter, Mike; Harmsen, Jeremiah; LeFevre, Kristen; Lepikhin, Dmitry; Llinares, Tomas Lloret; Mukherjee, Indraneel; Pereira, Fernando; Redstone, Josh; Shaked, Tal; Singer, Yoram. "Sibyl: A system for large scale supervised machine learning" (PDF). Jack Baskin School of Engineering. UC Santa Cruz. Retrieved 8 June 2016.
  50. Woodie, Alex (17 July 2014). "Inside Sibyl, Google's Massively Parallel Machine Learning Platform". Datanami. Tabor Communications. Retrieved 8 June 2016.
  51. "Google achieves AI 'breakthrough' by beating Go champion". BBC News. BBC. 27 January 2016. Retrieved 5 June 2016.
  52. "AlphaGo". Google DeepMind. Google Inc. Retrieved 5 June 2016.
  53. Vaswani, Ashish; Shazeer, Noam; Parmar, Niki; Uszkoreit, Jakob; Jones, Llion; Gomez, Aidan N.; Kaiser, Lukasz; Polosukhin, Illia (2017). "Attention Is All You Need". arXiv: 1706.03762 .{{cite journal}}: Cite journal requires |journal= (help)
  54. Sample, Ian (2 December 2018). "Google's DeepMind predicts 3D shapes of proteins". The Guardian.
  55. Eisenstein, Michael (23 November 2021). "Artificial intelligence powers protein-folding predictions". Nature. 599 (7886): 706–708. doi:10.1038/d41586-021-03499-y. S2CID   244528561.

Works cited