Richard S. Sutton

Richard S. Sutton
FRS FRSC
Richard S. Sutton FRS FRSC
	Sutton in 2021
Born	1957or1958(age 67–68); Ohio, U.S.
Citizenship	Canada since 2015, USA until 2017
Education	Stanford University (BA); University of Massachusetts, Amherst (MS, PhD)
Known for	Temporal difference learning, Dyna, Options, GQ(λ)
Awards	AAAI Fellow (2001); President's Award (INNS) (2003); Royal Society of Canada Fellow (2016) ; Turing Award (2024)
	Scientific career
Fields	Artificial intelligence ; Reinforcement learning
Institutions	University of Alberta
Thesis	Temporal credit assignment in reinforcement learning (1984)
Doctoral advisor	Andrew Barto
Doctoral students	David Silver ; Doina Precup
Website	incompleteideas.net

Last updated November 18, 2025

Richard Stuart Sutton FRS FRSC (born 1957 or 1958) is a Canadian computer scientist. He is a professor of computing science at the University of Alberta, fellow & Chief Scientific Advisor at the Alberta Machine Intelligence Institute, and a research scientist at Keen Technologies.^[3] Sutton is considered one of the founders of modern computational reinforcement learning.^[4] In particular, he contributed to temporal difference learning and policy gradient methods.^[5] He received the 2024 Turing Award with Andrew Barto.^[6]^[7]

Life and education

Richard Sutton was born in either 1957 or 1958^[8]^[9] in Ohio, and grew up in Oak Brook, Illinois, a suburb of Chicago, United States.^[10]

Sutton received his B.A. in psychology from Stanford University in 1978 before taking an MS (1980) and PhD (1984) in computer science from the University of Massachusetts Amherst under the supervision of Andrew Barto. His doctoral dissertation, Temporal Credit Assignment in Reinforcement Learning, introduced actor-critic architectures and temporal credit assignment.^[11]^[5]

He was influenced by Harry Klopf's work in the 1970s, which proposed that supervised learning is insufficient for AI or explaining intelligent behavior, and trial-and-error learning, driven by "hedonic aspects of behavior", is necessary. This focused his interest to reinforcement learning.^[12]

Career

Sutton held a postdoctoral position at the University of Massachusetts Amherst in 1984.^[13] He worked at GTE Laboratories in Waltham, Massachusetts as principal member of technical staff from 1985 to 1994, then returned to the University of Massachusetts Amherst as a senior research scientist.^[14] He joined AT&T Labs Shannon Laboratory in Florham Park, New Jersey as principal technical staff member from 1998 to 2002.^[7] He has been a professor of computing science at the University of Alberta since 2003, where he helped establish the Reinforcement Learning and Artificial Intelligence Laboratory.^[15] In 2017 he became a distinguished research scientist with DeepMind and helped launch DeepMind Alberta in Edmonton, a research office operated in close collaboration with the University of Alberta.^[16] He was elected Fellow of the Royal Society of Canada in 2016 and Fellow of the Royal Society in 2021.^[17]^[18]^[5]^[1]

1984: Postdoctoral researcher, University of Massachusetts Amherst (Amherst, Massachusetts)
1985–1994: Principal member of technical staff, Computer and Intelligent Systems Laboratory, GTE Laboratories (Waltham, Massachusetts)
1995–1998: Senior research scientist, University of Massachusetts Amherst (Amherst, Massachusetts)
1998–2002: Principal technical staff member, Artificial Intelligence Department, AT&T Labs Shannon Laboratory (Florham Park, New Jersey)
2003–present: Professor of computing science, University of Alberta (Edmonton, Alberta)
2017–2023: Distinguished research scientist, DeepMind Alberta, DeepMind (Edmonton, Alberta)
2024–Present: Research scientist, Keen Technologies

Sutton became a Canadian citizen in 2015,^[1] and his renunciation of US citizenship was reported in 2017.^[2]

Reinforcement learning

Sutton joined Andrew Barto in the early 1980s at UMass, trying to explore the behavior of neurons in the human brain as the basis for human intelligence, a concept that had been advanced by computer scientist A. Harry Klopf. Sutton and Barto used mathematics toward furthering the concept and using it as the basis for artificial intelligence. This concept became known as reinforcement learning and went on to becoming a key part of artificial intelligence techniques.^[19]

Barto and Sutton used Markov decision processes (MDP) as the mathematical foundation to explain how agents (algorithmic entities) made decisions when in a stochastic or random environment, receiving rewards at the end of every action. Traditional MDP theory assumed the agents knew all information about the MDPs in their attempt toward maximizing their cumulative rewards. Barto and Sutton's reinforcement learning techniques allowed for both the environment and the rewards to be unknown, and thus allowed for these category of algorithms to be applied to a wide array of problems.^[20]

Sutton returned to Canada in the 2000s and continued working on the topic which continued to develop in academic circles until one of its first major real world applications saw Google's AlphaGo program built on this concept defeating the then prevailing human champion.^[19] Barto and Sutton have widely been credited and accepted as pioneers of modern reinforcement learning, with the technique itself being foundational to the modern AI boom.^[21]

In a 2019 essay, Sutton proposed the "bitter lesson", which criticized the field of AI research for failing to learn that "building in how we think we think does not work in the long run", arguing that "70 years of AI research [had shown] that general methods that leverage computation are ultimately the most effective, and by a large margin", beating efforts building on human knowledge about specific fields like computer vision, speech recognition, chess or Go.^[22]^[23]

Sutton argues that large language models aren’t capable of learning on-the-job, and so new model architectures are required to enable continual learning.^[24]^{[ non-primary source needed ]} Sutton further argues that a special training phase will be unnecessary — the agent will learn on-the-fly, rendering large language models obsolete.^[24]

In 2023, Sutton and John Carmack announced a partnership for the development of artificial general intelligence (AGI).^[3]

Awards and honors

Sutton is a fellow of the Association for the Advancement of Artificial Intelligence (AAAI) since 2001;^[25] his nomination read: "For significant contributions to many topics in machine learning, including reinforcement learning, temporal difference techniques, and neural networks."^[25] In 2003, he received the President's Award from the International Neural Network Society^[26] and in 2013, the Outstanding Achievement in Research award from the University of Massachusetts Amherst.^[27] He received the 2024 Turing Award from the Association for Computing Machinery together with Andrew Barto; the citation of the award read: "For developing the conceptual and algorithmic foundations of reinforcement learning."^[6]^[28]

In 2016, Sutton was elected Fellow of the Royal Society of Canada.^[29] In 2021, he was elected Fellow of the Royal Society of London.^[30]

Research and publications

Sutton introduced temporal-difference methods for prediction and control, establishing convergence properties and practical algorithms.^[31] He proposed integrated learning and planning through the Dyna architecture.^[32] He co-developed the options framework for temporal abstraction in reinforcement learning.^[33] He co-authored the first modern policy gradient formulation with function approximation.^[34]^[13]^[7]^[18]

His essay The Bitter Lesson summarized a view that general methods that scale with computation dominate domain-specific approaches in the long run.^[35]

Selected publications

Year	Title	Venue or publisher	Notes
1988	Learning to predict by the methods of temporal differences	Machine Learning 3, 9-44	TD learning foundations^[36]
1990	Neural Networks for Control	MIT Press	co-editor with W. T. Miller III and P. J. Werbos^[37]
1991	Dyna, an integrated architecture for learning, planning, and reacting	ACM SIGART Bulletin	Early Dyna results^[38]
1998	Reinforcement Learning: An Introduction	MIT Press	with Andrew G. Barto. First edition^[39]
1999	Between MDPs and semi-MDPs, a framework for temporal abstraction in RL	Artificial Intelligence 112, 181-211	Options framework with Doina Precup and Satinder Singh^[40]
2000	Policy Gradient Methods for Reinforcement Learning with Function Approximation	NeurIPS 12	Policy gradient theorem with function approximation^[41]
2010	GQ(lambda), a general gradient algorithm for temporal-difference prediction learning with eligibility traces	technical report, University of Alberta	off-policy TD with gradients, with H. R. Maei^[42]
2018	Reinforcement Learning, An Introduction	MIT Press	with Andrew G. Barto. Second edition^[43]

References

1 2 3 "Edmonton AI guru Rich Sutton has lost his DeepMind but not his ambition". National Post. March 19, 2023. Retrieved July 2, 2023.
1 2 "Quarterly Publication of Individuals, Who Have Chosen To Expatriate, as Required by Section 6039G". Internal Revenue Service. November 2, 2017. Archived from the original on March 4, 2025.
1 2 "John Carmack and Rich Sutton partner to accelerate development of Artificial General Intelligence". markets.businessinsider.com. Retrieved October 2, 2023.
↑ "Exclusive: Interview with Rich Sutton, the Father of Reinforcement Learning". January 11, 2018. Archived from the original on January 11, 2018. Retrieved December 17, 2018.
1 2 3 Piatetsky, Gregory (December 5, 2017). "Exclusive: Interview with Rich Sutton, the Father of Reinforcement Learning". KDnuggets. Retrieved February 10, 2024.
1 2 Metz, Cade (March 5, 2025). "Turing Award Goes to 2 Pioneers of Artificial Intelligence". The New York Times. Retrieved August 19, 2025.
1 2 3 "Dr. Richard Sutton". Association for Computing Machinery. Retrieved October 2, 2025.
↑ "Andrew Barto and Richard Sutton, pioneers in field of reinforcement learning, win AM Turing Award". The Telegraph . March 5, 2025. Retrieved March 10, 2025. Research that Barto, 76, and Sutton, 67, began in the late 1970s paved the way for some of the past decade's AI breakthroughs.
↑ "Rich Sutton, A.M. Turing Award Winner: Understanding Intelligence". Amii. March 5, 2025. Retrieved March 10, 2025. So I'm 67 years old, but I want to still try to do some amazing things.
↑ Heidrich-Meisner, Verena (2009). "Interview with Richard S. Sutton" (PDF). Künstliche intelligenz, Heft.
↑ "Brief Biography for Richard Sutton". incompleteideas.net. Retrieved December 17, 2018.
↑ Sutton, Richard S.; Barto, Andrew (2020). Reinforcement learning: an introduction (Second ed.). Cambridge, Massachusetts: The MIT Press. pp. 22–23. ISBN 978-0-262-03924-6.
1 2 "Temporal credit assignment in reinforcement learning" (PDF). University of Massachusetts Amherst. February 1984. Retrieved October 2, 2025.
↑ "Richard S. Sutton, Curriculum Vitae" (PDF). incompleteideas.net. Retrieved October 2, 2025.
↑ "Rich Sutton, PhD". University of Alberta. Retrieved October 2, 2025.
↑ "DeepMind expands to Canada with new research office in Edmonton, Alberta". DeepMind. July 5, 2017. Retrieved October 2, 2025.
↑ "Richard S. Sutton". Royal Society of Canada. Retrieved October 2, 2025.
1 2 "Professor Rich Sutton FRS". The Royal Society. Retrieved October 2, 2025.
1 2 Metz, Cade (March 5, 2025). "Turing Award Goes to 2 Pioneers of Artificial Intelligence". The New York Times. ISSN 0362-4331 . Retrieved March 8, 2025.
↑ "A.M. Turing Award". amturing.acm.org. Retrieved March 8, 2025.
↑ "AI pioneers Andrew Barto and Richard Sutton win 2025 Turing Award for groundbreaking contributions to reinforcement learning | NSF – National Science Foundation". www.nsf.gov. March 5, 2025. Retrieved March 8, 2025.
↑ Sutton, Rich (March 13, 2019). "The Bitter Lesson". www.incompleteideas.net. Retrieved September 22, 2022.
↑ Tunstall, Lewis; Werra, Leandro von; Wolf, Thomas (January 26, 2022). Natural Language Processing with Transformers. "O'Reilly Media, Inc.". ISBN 978-1-0981-0319-4.
1 2 Dwarkesh Patel (September 25, 2025). "Richard Sutton – Father of RL thinks LLMs are a dead end". Dwarkesh Podcast. Retrieved September 28, 2025.
1 2 "Elected AAAI Fellows". www.aaai.org. Retrieved December 17, 2018.
↑ "INNS Award Recipients". www.inns.org. Retrieved December 17, 2018.
↑ "Outstanding Achievement and Advocacy Award Recipients". College of Information and Computer Sciences, University of Massachusetts Amherst. October 5, 2010. Retrieved December 17, 2018.
↑ "Turing Awardees". National Science Foundation. March 5, 2025. Retrieved March 8, 2025.
↑ Brown, Michael (September 19, 2016). "U of A Scholars Join Ranks of Royal Society". The Quad. Retrieved August 24, 2023.
↑ "Royal Society elects outstanding new Fellows and Foreign Members". royalsociety.org . Retrieved June 8, 2021.
↑ Sutton, Richard S. (1988). "Learning to predict by the methods of temporal differences" (PDF). Machine Learning. 3: 9–44. doi:10.1007/BF00115009.
↑ Sutton, Richard S. (1991). "Dyna, an integrated architecture for learning, planning, and reacting" . ACM Sigart Bulletin. 2 (4): 160–163. doi:10.1145/122344.122377 . Retrieved October 2, 2025.
↑ Sutton, Richard S.; Precup, Doina; Singh, Satinder (1999). "Between MDPs and semi-MDPs, a framework for temporal abstraction in reinforcement learning" (PDF). Artificial Intelligence. 112 (1–2): 181–211. doi:10.1016/S0004-3702(99)00052-1.
↑ Sutton, Richard S.; McAllester, David; Singh, Satinder; Mansour, Yishay (2000). Policy Gradient Methods for Reinforcement Learning with Function Approximation, Advances in Neural Information Processing Systems 12 (PDF).
↑ Sutton, Richard S. (March 13, 2019). "The Bitter Lesson". incompleteideas.net. Retrieved October 2, 2025.
↑ Sutton, Richard S. (1988). "Learning to predict by the methods of temporal differences" (PDF). Machine Learning. 3: 9–44. doi:10.1007/BF00115009.
↑ Neural Networks for Control. Neural Network Modeling and Connectionism. MIT Press. March 2, 1995. ISBN 978-0-262-63161-7 . Retrieved October 2, 2025.
↑ Sutton, Richard S. (1991). "Dyna, an integrated architecture for learning, planning, and reacting" . ACM Sigart Bulletin. 2 (4): 160–163. doi:10.1145/122344.122377 . Retrieved October 2, 2025.
↑ Sutton, Richard S.; Barto, Andrew G. (1998). Reinforcement Learning, An Introduction. MIT Press. ISBN 0262193981 . Retrieved October 2, 2025.
↑ Sutton, Richard S.; Precup, Doina; Singh, Satinder (1999). "Between MDPs and semi-MDPs, a framework for temporal abstraction in reinforcement learning" (PDF). Artificial Intelligence. 112 (1–2): 181–211. doi:10.1016/S0004-3702(99)00052-1.
↑ Sutton, Richard S.; McAllester, David; Singh, Satinder; Mansour, Yishay (2000). "Policy Gradient Methods for Reinforcement Learning with Function Approximation" (PDF). Advances in Neural Information Processing Systems 12.
↑ "GQ(lambda): A general gradient algorithm for temporal-difference prediction learning with eligibility traces" (PDF). incompleteideas.net. Retrieved October 2, 2025.
↑ Sutton, Richard S.; Barto, Andrew G. (2018). Reinforcement Learning, An Introduction (2nd ed.). MIT Press. ISBN 9780262039246 . Retrieved October 2, 2025.

External links

Richard Sutton's Homepage
Richard S. Sutton publications indexed by Google Scholar

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[National_Post-2023-1] 1 2 3 "Edmonton AI guru Rich Sutton has lost his DeepMind but not his ambition". National Post. March 19, 2023. Retrieved July 2, 2023.

[fedreg2017-2] 1 2 "Quarterly Publication of Individuals, Who Have Chosen To Expatriate, as Required by Section 6039G". Internal Revenue Service. November 2, 2017. Archived from the original on March 4, 2025.

[auto-3] 1 2 "John Carmack and Rich Sutton partner to accelerate development of Artificial General Intelligence". markets.businessinsider.com. Retrieved October 2, 2023.

[4] "Exclusive: Interview with Rich Sutton, the Father of Reinforcement Learning". January 11, 2018. Archived from the original on January 11, 2018. Retrieved December 17, 2018.

[Piatetsky-2017-5] 1 2 3 Piatetsky, Gregory (December 5, 2017). "Exclusive: Interview with Rich Sutton, the Father of Reinforcement Learning". KDnuggets. Retrieved February 10, 2024.

[:0-6] 1 2 Metz, Cade (March 5, 2025). "Turing Award Goes to 2 Pioneers of Artificial Intelligence". The New York Times. Retrieved August 19, 2025.

[ACM-7] 1 2 3 "Dr. Richard Sutton". Association for Computing Machinery. Retrieved October 2, 2025.

[8] "Andrew Barto and Richard Sutton, pioneers in field of reinforcement learning, win AM Turing Award". The Telegraph . March 5, 2025. Retrieved March 10, 2025. Research that Barto, 76, and Sutton, 67, began in the late 1970s paved the way for some of the past decade's AI breakthroughs.

[9] "Rich Sutton, A.M. Turing Award Winner: Understanding Intelligence". Amii. March 5, 2025. Retrieved March 10, 2025. So I'm 67 years old, but I want to still try to do some amazing things.

[10] Heidrich-Meisner, Verena (2009). "Interview with Richard S. Sutton" (PDF). Künstliche intelligenz, Heft.

[incompleteideas.net-11] "Brief Biography for Richard Sutton". incompleteideas.net. Retrieved December 17, 2018.

[12] Sutton, Richard S.; Barto, Andrew (2020). Reinforcement learning: an introduction (Second ed.). Cambridge, Massachusetts: The MIT Press. pp. 22–23. ISBN 978-0-262-03924-6.

[Thesis-13] 1 2 "Temporal credit assignment in reinforcement learning" (PDF). University of Massachusetts Amherst. February 1984. Retrieved October 2, 2025.

[CV-14] "Richard S. Sutton, Curriculum Vitae" (PDF). incompleteideas.net. Retrieved October 2, 2025.

[UADir-15] "Rich Sutton, PhD". University of Alberta. Retrieved October 2, 2025.

[DM-16] "DeepMind expands to Canada with new research office in Edmonton, Alberta". DeepMind. July 5, 2017. Retrieved October 2, 2025.

[RSC-17] "Richard S. Sutton". Royal Society of Canada. Retrieved October 2, 2025.

[RS-18] 1 2 "Professor Rich Sutton FRS". The Royal Society. Retrieved October 2, 2025.

[Metz-2025-19] 1 2 Metz, Cade (March 5, 2025). "Turing Award Goes to 2 Pioneers of Artificial Intelligence". The New York Times. ISSN 0362-4331 . Retrieved March 8, 2025.

[20] "A.M. Turing Award". amturing.acm.org. Retrieved March 8, 2025.

[21] "AI pioneers Andrew Barto and Richard Sutton win 2025 Turing Award for groundbreaking contributions to reinforcement learning | NSF – National Science Foundation". www.nsf.gov. March 5, 2025. Retrieved March 8, 2025.

[22] Sutton, Rich (March 13, 2019). "The Bitter Lesson". www.incompleteideas.net. Retrieved September 22, 2022.

[23] Tunstall, Lewis; Werra, Leandro von; Wolf, Thomas (January 26, 2022). Natural Language Processing with Transformers. "O'Reilly Media, Inc.". ISBN 978-1-0981-0319-4.

[dwarkesh2025-24] 1 2 Dwarkesh Patel (September 25, 2025). "Richard Sutton – Father of RL thinks LLMs are a dead end". Dwarkesh Podcast. Retrieved September 28, 2025.

[www.aaai.org-25] 1 2 "Elected AAAI Fellows". www.aaai.org. Retrieved December 17, 2018.

[26] "INNS Award Recipients". www.inns.org. Retrieved December 17, 2018.

[27] "Outstanding Achievement and Advocacy Award Recipients". College of Information and Computer Sciences, University of Massachusetts Amherst. October 5, 2010. Retrieved December 17, 2018.

[28] "Turing Awardees". National Science Foundation. March 5, 2025. Retrieved March 8, 2025.

[29] Brown, Michael (September 19, 2016). "U of A Scholars Join Ranks of Royal Society". The Quad. Retrieved August 24, 2023.

[30] "Royal Society elects outstanding new Fellows and Foreign Members". royalsociety.org . Retrieved June 8, 2021.

[31] Sutton, Richard S. (1988). "Learning to predict by the methods of temporal differences" (PDF). Machine Learning. 3: 9–44. doi:10.1007/BF00115009.

[32] Sutton, Richard S. (1991). "Dyna, an integrated architecture for learning, planning, and reacting" . ACM Sigart Bulletin. 2 (4): 160–163. doi:10.1145/122344.122377 . Retrieved October 2, 2025.

[33] Sutton, Richard S.; Precup, Doina; Singh, Satinder (1999). "Between MDPs and semi-MDPs, a framework for temporal abstraction in reinforcement learning" (PDF). Artificial Intelligence. 112 (1–2): 181–211. doi:10.1016/S0004-3702(99)00052-1.

[34] Sutton, Richard S.; McAllester, David; Singh, Satinder; Mansour, Yishay (2000). Policy Gradient Methods for Reinforcement Learning with Function Approximation, Advances in Neural Information Processing Systems 12 (PDF).

[35] Sutton, Richard S. (March 13, 2019). "The Bitter Lesson". incompleteideas.net. Retrieved October 2, 2025.

[36] Sutton, Richard S. (1988). "Learning to predict by the methods of temporal differences" (PDF). Machine Learning. 3: 9–44. doi:10.1007/BF00115009.

[37] Neural Networks for Control. Neural Network Modeling and Connectionism. MIT Press. March 2, 1995. ISBN 978-0-262-63161-7 . Retrieved October 2, 2025.

[38] Sutton, Richard S. (1991). "Dyna, an integrated architecture for learning, planning, and reacting" . ACM Sigart Bulletin. 2 (4): 160–163. doi:10.1145/122344.122377 . Retrieved October 2, 2025.

[RL1-39] Sutton, Richard S.; Barto, Andrew G. (1998). Reinforcement Learning, An Introduction. MIT Press. ISBN 0262193981 . Retrieved October 2, 2025.

[40] Sutton, Richard S.; Precup, Doina; Singh, Satinder (1999). "Between MDPs and semi-MDPs, a framework for temporal abstraction in reinforcement learning" (PDF). Artificial Intelligence. 112 (1–2): 181–211. doi:10.1016/S0004-3702(99)00052-1.

[41] Sutton, Richard S.; McAllester, David; Singh, Satinder; Mansour, Yishay (2000). "Policy Gradient Methods for Reinforcement Learning with Function Approximation" (PDF). Advances in Neural Information Processing Systems 12.

[42] "GQ(lambda): A general gradient algorithm for temporal-difference prediction learning with eligibility traces" (PDF). incompleteideas.net. Retrieved October 2, 2025.

[RL2-43] Sutton, Richard S.; Barto, Andrew G. (2018). Reinforcement Learning, An Introduction (2nd ed.). MIT Press. ISBN 9780262039246 . Retrieved October 2, 2025.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

[28]

[29]

[30]

[31]

[32]

[33]

[34]

[35]

[36]

[37]

[38]

[39]

[40]

[41]

[42]

[43]

Authority control databases
International	ISNI VIAF GND WorldCat
National	United States France BnF data Japan Czech Republic Netherlands Norway Israel
Academics	CiNii ORCID Mathematics Genealogy Project Association for Computing Machinery zbMATH Google Scholar DBLP MathSciNet
Other	IdRef Yale LUX

Richard S. Sutton FRS FRSC
Sutton in 2021
Born	1957or1958(age 67–68) Ohio, U.S.
Citizenship	Canada since 2015,^[1] USA until 2017^[2]
Education	Stanford University (BA) University of Massachusetts, Amherst (MS, PhD)
Known for	Temporal difference learning, Dyna, Options, GQ(λ)
Awards	AAAI Fellow (2001) President's Award (INNS) (2003) Royal Society of Canada Fellow (2016) Turing Award (2024)
Scientific career
Fields	Artificial intelligence Reinforcement learning
Institutions	University of Alberta
Thesis	Temporal credit assignment in reinforcement learning (1984)
Doctoral advisor	Andrew Barto
Doctoral students	David Silver Doina Precup

Website	incompleteideas.net