Timothy Lillicrap

Timothy P. Lillicrap
Timothy P. Lillicrap
Nationality	Canadian
Alma mater	Queen's University
Known for	Optimal Control, Decision Making, Deep Learning, AlphaGo, AlphaZero
Awards	CfNS Award for Excellence; Governor General's Academic Medal
	Scientific career
Fields	Artificial intelligence; Theoretical neuroscience
Institutions	Google Deepmind, University College London
Doctoral advisor	Stephen H. Scott

Last updated November 26, 2024

Timothy P. Lillicrap is a Canadian neuroscientist and AI researcher, adjunct professor at University College London, and staff research scientist at Google DeepMind, where he has been involved in the AlphaGo and AlphaZero projects mastering the games of Go, Chess and Shogi. His research focuses on machine learning and statistics for optimal control and decision making, as well as using these mathematical frameworks to understand how the brain learns. He has developed algorithms and approaches for exploiting deep neural networks in the context of reinforcement learning, and new recurrent memory architectures for one-shot learning.^[1]

His numerous contributions to the field have earned him a number of honors, including the Governor General's Academic Medal, an NSERC Fellowship, the Centre for Neuroscience Studies Award for Excellence, and numerous European Research Council grants. He has also won a number of Social Learning tournaments.

Biography

Lillicrap attained a B.Sc. in cognitive science and artificial intelligence from University of Toronto in 2005, and a Ph.D. in systems neuroscience from Queen's University in 2012 under Stephen H. Scott. He then went on to become a postdoctoral research fellow at Oxford University, and joined Google DeepMind as a research scientist in 2014. Following a series of promotions, he eventually became a DeepMind staff research scientist in 2016, a position he still holds as of 2021.^[2]^[3]

In 2016, Lillicrap accepted an adjunct professorship at University College London.^[4]

Select publications

Timothy Lillicrap has an extensive publication record. A selection of works is listed below:

Timothy Lillicrap (2014). Modelling Motor Cortex using Neural Network Controls Laws. Ph.D. Systems Neuroscience Thesis, Centre for Neuroscience Studies, Queen's University, advisor: Stephen H. Scott
Timothy Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, Daan Wierstra (2015). Continuous Control with Deep Reinforcement Learning. arXiv:1509.02971
Nicolas Heess, Jonathan J. Hunt, Timothy Lillicrap, David Silver (2015). Memory-based control with recurrent neural networks. arXiv:1512.04455
David Silver, Aja Huang, Chris J. Maddison, Arthur Guez, Laurent Sifre, George van den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, Sander Dieleman, Dominik Grewe, John Nham, Nal Kalchbrenner, Ilya Sutskever, Timothy Lillicrap, Madeleine Leach, Koray Kavukcuoglu, Thore Graepel, Demis Hassabis (2016). Mastering the game of Go with deep neural networks and tree search . Nature, Vol. 529 » AlphaGo
Volodymyr Mnih, Adrià Puigdomènech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, Koray Kavukcuoglu (2016). Asynchronous Methods for Deep Reinforcement Learning. arXiv:1602.01783v2
Shixiang Gu, Timothy Lillicrap, Ilya Sutskever, Sergey Levine (2016). Continuous Deep Q-Learning with Model-based Acceleration. arXiv:1603.00748 ^[5]^{[ circular reference ]}
Shixiang Gu, Ethan Holly, Timothy Lillicrap, Sergey Levine (2016). Deep Reinforcement Learning for Robotic Manipulation with Asynchronous Off-Policy Updates. arXiv:1610.00633
Shixiang Gu, Timothy Lillicrap, Zoubin Ghahramani, Richard E. Turner, Sergey Levine (2016). Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic. arXiv:1611.02247
Yutian Chen, Matthew W. Hoffman, Sergio Gomez Colmenarejo, Misha Denil, Timothy Lillicrap, Matthew Botvinick, Nando de Freitas (2017). Learning to Learn without Gradient Descent by Gradient Descent. arXiv:1611.03824v6, ICML 2017
David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, Yutian Chen, Timothy Lillicrap, Fan Hui, Laurent Sifre, George van den Driessche, Thore Graepel, Demis Hassabis (2017). Mastering the game of Go without human knowledge . Nature, Vol. 550^[6]
David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, Laurent Sifre, Dharshan Kumaran, Thore Graepel, Timothy Lillicrap, Karen Simonyan, Demis Hassabis (2017). Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm. arXiv:1712.01815 » AlphaZero
Jack W. Rae, Chris Dyer, Peter Dayan, Timothy Lillicrap (2018). Fast Parametric Learning with Activation Memorization. arXiv:1803.10049
David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, Laurent Sifre, Dharshan Kumaran, Thore Graepel, Timothy Lillicrap, Karen Simonyan, Demis Hassabis (2018). A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play . Science, Vol. 362, No. 6419^[7]
Julian Schrittwieser, Ioannis Antonoglou, Thomas Hubert, Karen Simonyan, Laurent Sifre, Simon Schmitt, Arthur Guez, Edward Lockhart, Demis Hassabis, Thore Graepel, Timothy Lillicrap, David Silver (2019). Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model. arXiv:1911.08265

Notable awards

NSERC Fellowship
Queen's University Graduate Award
Governor General's Academic Medal
Social Learning Strategies Tournament Winner
2nd Social Learning Strategies Tournament Winner
European Research Council Proof of Concept Grant
Centre for Neuroscience Studies Award for Excellence
University College Howard Ferguson Entrance Scholarship
HPCVL / Sun Microsystems of Canada, Inc. Scholarship in Computational Sciences and Engineering

Related Research Articles

An evaluation function, also known as a heuristic evaluation function or static evaluation function, is a function used by game-playing computer programs to estimate the value or goodness of a position in a game tree. Most of the time, the value is either a real number or a quantized integer, often in nths of the value of a playing piece such as a stone in go or a pawn in chess, where n may be tenths, hundredths or other convenient fraction, but sometimes, the value is an array of three values in the unit interval, representing the win, draw, and loss percentages of the position.

In the game of Go, a ladder,([征子] Error: {{Lang}}: invalid parameter: |3= ) is a basic sequence of moves in which an attacker pursues a group in atari in a zig-zag pattern across the board. If there are no intervening stones, the group will hit the edge of the board and be captured.

The KGS Go Server, known until 2006 as the Kiseido Go Server, is a game server first developed in 1999 and established in 2000 for people to play Go. The system was developed by William M. Shubert and its code is now written entirely in Java. In Spring of 2017, Shubert transferred ownership to the American Go Foundation.

Dimitri Panteli Bertsekas is an applied mathematician, electrical engineer, and computer scientist, a McAfee Professor at the Department of Electrical Engineering and Computer Science in School of Engineering at the Massachusetts Institute of Technology (MIT), Cambridge, Massachusetts, and also a Fulton Professor of Computational Decision Making at Arizona State University, Tempe.

DeepMind Technologies Limited, also known by its trade name Google DeepMind, is a British-American artificial intelligence research laboratory which serves as a subsidiary of Google. Founded in the UK in 2010, it was acquired by Google in 2014 and merged with Google AI's Google Brain division to become Google DeepMind in April 2023. The company is based in London, with research centres in Canada, France, Germany, and the United States.

AlphaGo is a computer program that plays the board game Go. It was developed by the London-based DeepMind Technologies, an acquired subsidiary of Google. Subsequent versions of AlphaGo became increasingly powerful, including a version that competed under the name Master. After retiring from competitive play, AlphaGo Master was succeeded by an even more powerful version known as AlphaGo Zero, which was completely self-taught without learning from human games. AlphaGo Zero was then generalized into a program known as AlphaZero, which played additional games, including chess and shogi. AlphaZero has in turn been succeeded by a program known as MuZero which learns without being taught the rules.

<span class="mw-page-title-main">Fan Hui</span> Chinese-born French Go player

Fan Hui is a Chinese-born French Go player. Becoming a professional Go player in 1996, Fan moved to France in 2000 and became the coach of the French national Go team in 2005. He was the winner of the European Go Championship in 2013, 2014 and 2015. As of 2015, he is ranked as a 2 dan professional. He additionally won the 2016 European Professional Go Championship.

Darkforest is a computer go program developed by Meta Platforms, based on deep learning techniques using a convolutional neural network. Its updated version Darkfores2 combines the techniques of its predecessor with Monte Carlo tree search. The MCTS effectively takes tree search methods commonly seen in computer chess programs and randomizes them. With the update, the system is known as Darkfmcts3.

Tygem is an internet go server owned by South Korean company TongYang Online. Popular in Asia, their website states that over 500 professional Go players use their service.

Aja Huang is a Taiwanese computer scientist and expert on artificial intelligence. He works for DeepMind and was a member of the AlphaGo project.

AlphaGo versus Ke Jie was a three-game Go match between the computer Go program AlphaGo Master and current world No. 1 ranking player Ke Jie, being part of the Future of Go Summit in Wuzhen, China, played on 23, 25, and 27 May 2017. AlphaGo defeated Ke Jie in all three games.

AlphaGo Zero is a version of DeepMind's Go software AlphaGo. AlphaGo's team published an article in Nature in October 2017 introducing AlphaGo Zero, a version created without using data from human games, and stronger than any previous version. By playing games against itself, AlphaGo Zero: surpassed the strength of AlphaGo Lee in three days by winning 100 games to 0; reached the level of AlphaGo Master in 21 days; and exceeded all previous versions in 40 days.

AlphaGo versus Fan Hui was a five-game Go match between European champion Fan Hui, a 2-dan professional, and AlphaGo, a computer Go program developed by DeepMind, held at DeepMind's headquarters in London in October 2015. AlphaGo won all five games. This was the first time a computer Go program had beaten a professional human player on a full-sized board without handicap. This match was not disclosed to the public until 27 January 2016 to coincide with the publication of a paper in the journal Nature describing the algorithms AlphaGo used.

Leela Zero is a free and open-source computer Go program released on 25 October 2017. It is developed by Belgian programmer Gian-Carlo Pascutto, the author of chess engine Sjeng and Go engine Leela.

AlphaZero is a computer program developed by artificial intelligence research company DeepMind to master the games of chess, shogi and go. This algorithm uses an approach similar to AlphaGo Zero.

Elmo is a computer shogi evaluation function and book file (joseki) created by Makoto Takizawa (瀧澤誠). It is designed to be used with a third-party shogi alpha–beta search engine.

Deep reinforcement learning is a subfield of machine learning that combines reinforcement learning (RL) and deep learning. RL considers the problem of a computational agent learning to make decisions by trial and error. Deep RL incorporates deep learning into the solution, allowing agents to make decisions from unstructured input data without manual engineering of the state space. Deep RL algorithms are able to take in very large inputs and decide what actions to perform to optimize an objective. Deep reinforcement learning has been used for a diverse set of applications including but not limited to robotics, video games, natural language processing, computer vision, education, transportation, finance and healthcare.

Artificial intelligence and machine learning techniques are used in video games for a wide variety of applications such as non-player character (NPC) control and procedural content generation (PCG). Machine learning is a subset of artificial intelligence that uses historical data to build predictive and analytical models. This is in sharp contrast to traditional methods of artificial intelligence such as search trees and expert systems.

MuZero is a computer program developed by artificial intelligence research company DeepMind to master games without knowing their rules. Its release in 2019 included benchmarks of its performance in go, chess, shogi, and a standard suite of Atari games. The algorithm uses an approach similar to AlphaZero. It matched AlphaZero's performance in chess and shogi, improved on its performance in Go, and improved on the state of the art in mastering a suite of 57 Atari games, a visually-complex domain.

Self-play is a technique for improving the performance of reinforcement learning agents. Intuitively, agents learn to improve their performance by playing "against themselves".

References

Content in this article was copied from Timothy Lillicrap at the Chess Programming wiki, which is licensed under the Creative Commons Attribution-Share Alike 3.0 (Unported) (CC-BY-SA 3.0) license.

↑ "Timothy Lillicrap - research". contrastiveconvergence.net. Retrieved 2020-02-01.
↑ Timothy Lillicrap (2014). Modelling Motor Cortex using Neural Network Controls Laws. Ph.D. Systems Neuroscience Thesis, Centre for Neuroscience Studies, Queen's University, advisor: Stephen H. Scott
↑ "dblp: Timothy P. Lillicrap". dblp.uni-trier.de. Retrieved 2020-02-01.
↑ Curriculum Vitae - Timothy P. Lillicrap (pdf)
↑ Q-learning from Wikipedia
↑ AlphaGo Zero: Learning from scratch Archived 2017-10-19 at the Wayback Machine by Demis Hassabis and David Silver, DeepMind, October 18, 2017
↑ AlphaZero: Shedding new light on the grand games of chess, shogi and Go by David Silver, Thomas Hubert, Julian Schrittwieser and Demis Hassabis, DeepMind, December 03, 2018

External links

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[contrastiveconvergence-1] "Timothy Lillicrap - research". contrastiveconvergence.net. Retrieved 2020-02-01.

[2] Timothy Lillicrap (2014). Modelling Motor Cortex using Neural Network Controls Laws. Ph.D. Systems Neuroscience Thesis, Centre for Neuroscience Studies, Queen's University, advisor: Stephen H. Scott

[uni-trier-3] "dblp: Timothy P. Lillicrap". dblp.uni-trier.de. Retrieved 2020-02-01.

[4] Curriculum Vitae - Timothy P. Lillicrap (pdf)

[5] Q-learning from Wikipedia

[6] AlphaGo Zero: Learning from scratch Archived 2017-10-19 at the Wayback Machine by Demis Hassabis and David Silver, DeepMind, October 18, 2017

[7] AlphaZero: Shedding new light on the grand games of chess, shogi and Go by David Silver, Thomas Hubert, Julian Schrittwieser and Demis Hassabis, DeepMind, December 03, 2018

[1]

[2]

[3]

[4]

[5]

[6]

[7]