AlphaZero

Last updated

AlphaZero is a computer program developed by artificial intelligence research company DeepMind to master the games of chess, shogi and go. This algorithm uses an approach similar to AlphaGo Zero.

Contents

On December 5, 2017, the DeepMind team released a preprint paper introducing AlphaZero, which within 24 hours of training achieved a superhuman level of play in these three games by defeating world-champion programs Stockfish, Elmo, and the three-day version of AlphaGo Zero. In each case it made use of custom tensor processing units (TPUs) that the Google programs were optimized to use. [1] AlphaZero was trained solely via self-play using 5,000 first-generation TPUs to generate the games and 64 second-generation TPUs to train the neural networks, all in parallel, with no access to opening books or endgame tables. After four hours of training, DeepMind estimated AlphaZero was playing chess at a higher Elo rating than Stockfish 8; after nine hours of training, the algorithm defeated Stockfish 8 in a time-controlled 100-game tournament (28 wins, 0 losses, and 72 draws). [1] [2] [3] The trained algorithm played on a single machine with four TPUs.

DeepMind's paper on AlphaZero was published in the journal Science on 7 December 2018; [4] however, the AlphaZero program itself has not been made available to the public. [5] In 2019, DeepMind published a new paper detailing MuZero, a new algorithm able to generalise AlphaZero's work, playing both Atari and board games without knowledge of the rules or representations of the game. [6]

Relation to AlphaGo Zero

AlphaZero (AZ) is a more generalized variant of the AlphaGo Zero (AGZ) algorithm, and is able to play shogi and chess as well as Go. Differences between AZ and AGZ include: [1]

Stockfish and Elmo

Comparing Monte Carlo tree search searches, AlphaZero searches just 80,000 positions per second in chess and 40,000 in shogi, compared to 70 million for Stockfish and 35 million for Elmo. AlphaZero compensates for the lower number of evaluations by using its deep neural network to focus much more selectively on the most promising variation. [1]

Training

AlphaZero was trained solely via self-play, using 5,000 first-generation TPUs to generate the games and 64 second-generation TPUs to train the neural networks. In parallel, the in-training AlphaZero was periodically matched against its benchmark (Stockfish, Elmo, or AlphaGo Zero) in brief one-second-per-move games to determine how well the training was progressing. DeepMind judged that AlphaZero's performance exceeded the benchmark after around four hours of training for Stockfish, two hours for Elmo, and eight hours for AlphaGo Zero. [1]

Preliminary results

Outcome

Chess

In AlphaZero's chess match against Stockfish 8 (2016 TCEC world champion), each program was given one minute per move. AlphaZero was flying the English flag, while Stockfish the Norwegian. [7] Stockfish was allocated 64 threads and a hash size of 1 GB, [1] a setting that Stockfish's Tord Romstad later criticized as suboptimal. [8] [note 1] AlphaZero was trained on chess for a total of nine hours before the match. During the match, AlphaZero ran on a single machine with four application-specific TPUs. In 100 games from the normal starting position, AlphaZero won 25 games as White, won 3 as Black, and drew the remaining 72. [9] In a series of twelve, 100-game matches (of unspecified time or resource constraints) against Stockfish starting from the 12 most popular human openings, AlphaZero won 290, drew 886 and lost 24. [1]

Shogi

AlphaZero was trained on shogi for a total of two hours before the tournament. In 100 shogi games against Elmo (World Computer Shogi Championship 27 summer 2017 tournament version with YaneuraOu 4.73 search), AlphaZero won 90 times, lost 8 times and drew twice. [9] As in the chess games, each program got one minute per move, and Elmo was given 64 threads and a hash size of 1 GB. [1]

Go

After 34 hours of self-learning of Go and against AlphaGo Zero, AlphaZero won 60 games and lost 40. [1] [9]

Analysis

DeepMind stated in its preprint, "The game of chess represented the pinnacle of AI research over several decades. State-of-the-art programs are based on powerful engines that search many millions of positions, leveraging handcrafted domain expertise and sophisticated domain adaptations. AlphaZero is a generic reinforcement learning algorithm  originally devised for the game of go  that achieved superior results within a few hours, searching a thousand times fewer positions, given no domain knowledge except the rules." [1] DeepMind's Demis Hassabis, a chess player himself, called AlphaZero's play style "alien": It sometimes wins by offering counterintuitive sacrifices, like offering up a queen and bishop to exploit a positional advantage. "It's like chess from another dimension." [10]

Given the difficulty in chess of forcing a win against a strong opponent, the +28 –0 =72 result is a significant margin of victory. However, some grandmasters, such as Hikaru Nakamura and Komodo developer Larry Kaufman, downplayed AlphaZero's victory, arguing that the match would have been closer if the programs had access to an opening database (since Stockfish was optimized for that scenario). [11] Romstad additionally pointed out that Stockfish is not optimized for rigidly fixed-time moves and the version used was a year old. [8] [12]

Similarly, some shogi observers argued that the Elmo hash size was too low, that the resignation settings and the "EnteringKingRule" settings (cf. shogi § Entering King) may have been inappropriate, and that Elmo is already obsolete compared with newer programs. [13] [14]

Reaction and criticism

Papers headlined that the chess training took only four hours: "It was managed in little more than the time between breakfast and lunch." [2] [15] Wired described AlphaZero as "the first multi-skilled AI board-game champ". [16] AI expert Joanna Bryson noted that Google's "knack for good publicity" was putting it in a strong position against challengers. "It's not only about hiring the best programmers. It's also very political, as it helps make Google as strong as possible when negotiating with governments and regulators looking at the AI sector." [9]

Human chess grandmasters generally expressed excitement about AlphaZero. Danish grandmaster Peter Heine Nielsen likened AlphaZero's play to that of a superior alien species. [9] Norwegian grandmaster Jon Ludvig Hammer characterized AlphaZero's play as "insane attacking chess" with profound positional understanding. [2] Former champion Garry Kasparov said, "It's a remarkable achievement, even if we should have expected it after AlphaGo." [11] [17]

Grandmaster Hikaru Nakamura was less impressed, stating: "I don't necessarily put a lot of credibility in the results simply because my understanding is that AlphaZero is basically using the Google supercomputer and Stockfish doesn't run on that hardware; Stockfish was basically running on what would be my laptop. If you wanna have a match that's comparable you have to have Stockfish running on a supercomputer as well." [8]

Top US correspondence chess player Wolff Morrow was also unimpressed, claiming that AlphaZero would probably not make the semifinals of a fair competition such as TCEC where all engines play on equal hardware. Morrow further stated that although he might not be able to beat AlphaZero if AlphaZero played drawish openings such as the Petroff Defence, AlphaZero would not be able to beat him in a correspondence chess game either. [18]

Motohiro Isozaki, the author of YaneuraOu, noted that although AlphaZero did comprehensively beat Elmo, the rating of AlphaZero in shogi stopped growing at a point which is at most 100–200 higher than Elmo. This gap is not that high, and Elmo and other shogi software should be able to catch up in 1–2 years. [19]

Final results

DeepMind addressed many of the criticisms in their final version of the paper, published in December 2018 in Science . [4] They further clarified that AlphaZero was not running on a supercomputer; it was trained using 5,000 tensor processing units (TPUs), but only ran on four TPUs and a 44-core CPU in its matches. [20]

Chess

In the final results, Stockfish version 8 ran under the same conditions as in the TCEC superfinal: 44 CPU cores, Syzygy endgame tablebases, and a 32GB hash size. Instead of a fixed time control of one move per minute, both engines were given 3 hours plus 15 seconds per move to finish the game. In a 1000-game match, AlphaZero won with a score of 155 wins, 6 losses, and 839 draws. DeepMind also played a series of games using the TCEC opening positions; AlphaZero also won convincingly. Stockfish needed 10-to-1 time odds to match AlphaZero. [21]

Shogi

Similar to Stockfish, Elmo ran under the same conditions as in the 2017 CSA championship. The version of Elmo used was WCSC27 in combination with YaneuraOu 2017 Early KPPT 4.79 64AVX2 TOURNAMENT. Elmo operated on the same hardware as Stockfish: 44 CPU cores and a 32GB hash size. AlphaZero won 98.2% of games when playing sente (i.e. having the first move) and 91.2% overall.

Reactions and criticisms

Human grandmasters were generally impressed with AlphaZero's games against Stockfish. [21] Former world champion Garry Kasparov said it was a pleasure to watch AlphaZero play, especially since its style was open and dynamic like his own. [22] [23]

In the computer chess community, Komodo developer Mark Lefler called it a "pretty amazing achievement", but also pointed out that the data was old, since Stockfish had gained a lot of strength since January 2018 (when Stockfish 8 was released). Fellow developer Larry Kaufman said AlphaZero would probably lose a match against the latest version of Stockfish, Stockfish 10, under Top Chess Engine Championship (TCEC) conditions. Kaufman argued that the only advantage of neural network–based engines was that they used a GPU, so if there was no regard for power consumption (e.g. in an equal-hardware contest where both engines had access to the same CPU and GPU) then anything the GPU achieved was "free". Based on this, he stated that the strongest engine was likely to be a hybrid with neural networks and standard alpha–beta search. [24]

AlphaZero inspired the computer chess community to develop Leela Chess Zero, using the same techniques as AlphaZero. Leela contested several championships against Stockfish, where it showed roughly similar strength to Stockfish, although Stockfish has since pulled away. [25]

In 2019 DeepMind published MuZero, a unified system that played excellent chess, shogi, and go, as well as games in the Atari Learning Environment, without being pre-programmed with their rules. [26] [27]

See also

Notes

  1. Stockfish developer Tord Romstad responded with
    The match results by themselves are not particularly meaningful because of the rather strange choice of time controls and Stockfish parameter settings: The games were played at a fixed time of 1 minute/move, which means that Stockfish has no use of its time management heuristics (lot of effort has been put into making Stockfish identify critical points in the game and decide when to spend some extra time on a move; at a fixed time per move, the strength will suffer significantly). The version of Stockfish used is one year old, was playing with far more search threads than has ever received any significant amount of testing, and had way too small hash tables for the number of threads. I believe the percentage of draws would have been much higher in a match with more normal conditions. [8]

Related Research Articles

<span class="mw-page-title-main">Computer chess</span> Computer hardware and software capable of playing chess

Computer chess includes both hardware and software capable of playing chess. Computer chess provides opportunities for players to practice even in the absence of human opponents, and also provides opportunities for analysis, entertainment and training. Computer chess applications that play at the level of a chess grandmaster or higher are available on hardware from supercomputers to smart phones. Standalone chess-playing machines are also available. Stockfish, Leela Chess Zero, GNU Chess, Fruit, and other free open source applications are available for various platforms.

<span class="mw-page-title-main">Evaluation function</span> Function in a computer game-playing program that evaluates a game position

An evaluation function, also known as a heuristic evaluation function or static evaluation function, is a function used by game-playing computer programs to estimate the value or goodness of a position in a game tree. Most of the time, the value is either a real number or a quantized integer, often in nths of the value of a playing piece such as a stone in go or a pawn in chess, where n may be tenths, hundredths or other convenient fraction, but sometimes, the value is an array of three values in the unit interval, representing the win, draw, and loss percentages of the position.

<span class="mw-page-title-main">Demis Hassabis</span> British entrepreneur and artificial intelligence researcher (born 1976)

Demis Hassabis is a British computer scientist, artificial intelligence researcher and entrepreneur. In his early career he was a video game AI programmer and designer, and an expert board games player. He is the chief executive officer and co-founder of DeepMind and Isomorphic Labs, and a UK Government AI Advisor. He is a Fellow of the Royal Society, and has won many prestigious awards for his work on AlphaFold including the Breakthrough Prize, the Canada Gairdner International Award, and the Lasker Award. In 2017 he was appointed a CBE and listed in the Time 100 most influential people list.

Computer shogi is a field of artificial intelligence concerned with the creation of computer programs which can play shogi. The research and development of shogi software has been carried out mainly by freelance programmers, university research groups and private companies. By 2017, the strongest programs were outperforming the strongest human players.

<span class="mw-page-title-main">Stockfish (chess)</span> Free and open-source chess engine

Stockfish is a free and open-source chess engine, available for various desktop and mobile platforms. It can be used in chess software through the Universal Chess Interface.

<span class="mw-page-title-main">Houdini (chess)</span> UCI chess engine

Houdini is a UCI chess engine developed by Belgian programmer Robert Houdart. It is influenced by open-source engines IPPOLIT/RobboLito, Stockfish, and Crafty. Versions up to 1.5a are available for non-commercial use, while 2.0 and later are commercial only.

<span class="mw-page-title-main">Google DeepMind</span> Artificial intelligence division

DeepMind Technologies Limited, doing business as Google DeepMind, is a British-American artificial intelligence research laboratory which serves as a subsidiary of Google. Founded in the UK in 2010, it was acquired by Google in 2014, The company is based in London, with research centres in Canada, France, Germany and the United States.

AlphaGo is a computer program that plays the board game Go. It was developed by the London-based DeepMind Technologies, an acquired subsidiary of Google. Subsequent versions of AlphaGo became increasingly powerful, including a version that competed under the name Master. After retiring from competitive play, AlphaGo Master was succeeded by an even more powerful version known as AlphaGo Zero, which was completely self-taught without learning from human games. AlphaGo Zero was then generalized into a program known as AlphaZero, which played additional games, including chess and shogi. AlphaZero has in turn been succeeded by a program known as MuZero which learns without being taught the rules.

David Silver is a principal research scientist at Google DeepMind and a professor at University College London. He has led research on reinforcement learning with AlphaGo, AlphaZero and co-lead on AlphaStar.

AlphaGo versus Ke Jie was a three-game Go match between the computer Go program AlphaGo Master and current world No. 1 ranking player Ke Jie, being part of the Future of Go Summit in Wuzhen, China, played on 23, 25, and 27 May 2017. AlphaGo defeated Ke Jie in all three games.

AlphaGo Zero is a version of DeepMind's Go software AlphaGo. AlphaGo's team published an article in the journal Nature on 19 October 2017, introducing AlphaGo Zero, a version created without using data from human games, and stronger than any previous version. By playing games against itself, AlphaGo Zero surpassed the strength of AlphaGo Lee in three days by winning 100 games to 0, reached the level of AlphaGo Master in 21 days, and exceeded all the old versions in 40 days.

Leela Zero is a free and open-source computer Go program released on 25 October 2017. It is developed by Belgian programmer Gian-Carlo Pascutto, the author of chess engine Sjeng and Go engine Leela.

Elmo is a computer shogi evaluation function and book file (joseki) created by Makoto Takizawa (瀧澤誠). It is designed to be used with a third-party shogi alpha–beta search engine.

<span class="mw-page-title-main">Leela Chess Zero</span> Deep neural network-based chess engine

Leela Chess Zero is a free, open-source, and deep neural network–based chess engine and volunteer computing project. Development has been spearheaded by programmer Gary Linscott, who is also a developer for the Stockfish chess engine. Leela Chess Zero was adapted from the Leela Zero Go engine, which in turn was based on Google's AlphaGo Zero project. One of the purposes of Leela Chess Zero was to verify the methods in the AlphaZero paper as applied to the game of chess.

AlphaStar is a computer program by DeepMind that plays the video game StarCraft II. It was unveiled to the public by name in January 2019. In a significant milestone for artificial intelligence, AlphaStar attained Grandmaster status in August 2019.

Timothy P. Lillicrap is a Canadian neuroscientist and AI researcher, adjunct professor at University College London, and staff research scientist at Google DeepMind, where he has been involved in the AlphaGo and AlphaZero projects mastering the games of Go, Chess and Shogi. His research focuses on machine learning and statistics for optimal control and decision making, as well as using these mathematical frameworks to understand how the brain learns. He has developed algorithms and approaches for exploiting deep neural networks in the context of reinforcement learning, and new recurrent memory architectures for one-shot learning.

<span class="mw-page-title-main">MuZero</span> Game-playing artificial intelligence

MuZero is a computer program developed by artificial intelligence research company DeepMind to master games without knowing their rules. Its release in 2019 included benchmarks of its performance in go, chess, shogi, and a standard suite of Atari games. The algorithm uses an approach similar to AlphaZero. It matched AlphaZero's performance in chess and shogi, improved on its performance in Go, and improved on the state of the art in mastering a suite of 57 Atari games, a visually-complex domain.

The history of chess began nearly 150 years ago, and over the past century and a half the game has changed drastically. No technology or strategy, however, has changed chess as much as the introduction of chess engines. Despite only coming into existence within the previous 70 years, the introduction of chess engines has molded and defined how top chess is played today.

Self-play is a technique for improving the performance of reinforcement learning agents. Intuitively, agents learn to improve their performance by playing "against themselves".

AlphaDev is an artificial intelligence system developed by Google DeepMind to discover enhanced computer science algorithms using reinforcement learning. AlphaDev is based on AlphaZero, a system that mastered the games of chess, shogi and go by self-play. AlphaDev applies the same approach to finding faster algorithms for fundamental tasks such as sorting and hashing.

References

  1. 1 2 3 4 5 6 7 8 9 10 Silver, David; Hubert, Thomas; Schrittwieser, Julian; Antonoglou, Ioannis; Lai, Matthew; Guez, Arthur; Lanctot, Marc; Sifre, Laurent; Kumaran, Dharshan; Graepel, Thore; Lillicrap, Timothy; Simonyan, Karen; Hassabis, Demis (December 5, 2017). "Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm". arXiv: 1712.01815 [cs.AI].
  2. 1 2 3 Knapton, Sarah; Watson, Leon (December 6, 2017). "Entire human chess knowledge learned and surpassed by DeepMind's AlphaZero in four hours". Telegraph.co.uk . Retrieved December 6, 2017.
  3. Vincent, James (December 6, 2017). "DeepMind's AI became a superhuman chess player in a few hours, just for fun". The Verge . Retrieved December 6, 2017.
  4. 1 2 Silver, David; Hubert, Thomas; Schrittwieser, Julian; Antonoglou, Ioannis; Lai, Matthew; Guez, Arthur; Lanctot, Marc; Sifre, Laurent; Kumaran, Dharshan; Graepel, Thore; Lillicrap, Timothy; Simonyan, Karen; Hassabis, Demis (December 7, 2018). "A general reinforcement learning algorithm that masters chess, shogi, and go through self-play". Science . 362 (6419): 1140–1144. Bibcode:2018Sci...362.1140S. doi: 10.1126/science.aar6404 . PMID   30523106.
  5. "Chess Terms: AlphaZero". Chess.com. Retrieved July 30, 2022.
  6. Schrittwieser, Julian; Antonoglou, Ioannis; Hubert, Thomas; Simonyan, Karen; Sifre, Laurent; Schmitt, Simon; Guez, Arthur; Lockhart, Edward; Hassabis, Demis; Graepel, Thore; Lillicrap, Timothy (2020). "Mastering Atari, Go, chess and shogi by planning with a learned model". Nature. 588 (7839): 604–609. arXiv: 1911.08265 . Bibcode:2020Natur.588..604S. doi:10.1038/s41586-020-03051-4. PMID   33361790. S2CID   208158225.
  7. "AlphaZero vs. Stockfish 2017".
  8. 1 2 3 4 "AlphaZero: Reactions From Top GMs, Stockfish Author". chess.com. December 8, 2017. Retrieved December 9, 2017.
  9. 1 2 3 4 5 "'Superhuman' Google AI claims chess crown". BBC News. December 6, 2017. Retrieved December 7, 2017.
  10. Knight, Will (December 8, 2017). "Alpha Zero's "Alien" Chess Shows the Power, and the Peculiarity, of AI". MIT Technology Review . Retrieved December 11, 2017.
  11. 1 2 "Google's AlphaZero Destroys Stockfish In 100-Game Match". Chess.com . Retrieved December 7, 2017.
  12. Katyanna Quach. "DeepMind's AlphaZero AI clobbered rival chess app on non-level playing...board". The Register (December 14, 2017).
  13. "Some concerns on the matching conditions between AlphaZero and Shogi engine". コンピュータ将棋 レーティング. "uuunuuun" (a blogger who rates free shogi engines). Retrieved December 9, 2017. (via "瀧澤 誠@elmo (@mktakizawa) | Twitter". mktakizawa (elmo developer). December 9, 2017. Retrieved December 11, 2017.)
  14. "DeepMind社がやねうら王に注目し始めたようです". The developer of YaneuraOu, a search component used by elmo. December 7, 2017. Retrieved December 9, 2017.
  15. Badshah, Nadeem (December 7, 2017). "Google's DeepMind robot becomes world-beating chess grandmaster in four hours". The Times of London . Retrieved December 7, 2017.
  16. "Alphabet's Latest AI Show Pony Has More Than One Trick". WIRED. December 6, 2017. Retrieved December 7, 2017.
  17. Gibbs, Samuel (December 7, 2017). "AlphaZero AI beats champion chess program after teaching itself in four hours". The Guardian. Retrieved December 8, 2017.
  18. "Talking modern correspondence chess". Chessbase. June 26, 2018. Retrieved July 11, 2018.
  19. DeepMind社がやねうら王に注目し始めたようです | やねうら王 公式サイト, 2017年12月7日
  20. As given in the Science paper, a TPU is "roughly similar in inference speed to a Titan V GPU, although the architectures are not directly comparable" (Ref. 24).
  21. 1 2 "AlphaZero Crushes Stockfish In New 1,000-Game Match". December 6, 2018.
  22. Sean Ingle (December 11, 2018). "'Creative' AlphaZero leads way for chess computers and, maybe, science". The Guardian.
  23. Albert Silver (December 7, 2018). "Inside the (deep) mind of AlphaZero". Chessbase.
  24. "Komodo MCTS (Monte Carlo Tree Search) is the new star of TCEC". Chessdom. December 18, 2018.
  25. See TCEC and Leela Chess Zero.
  26. "Could Artificial Intelligence Save Us From Itself?". Fortune. 2019. Retrieved February 29, 2020.
  27. "DeepMind's MuZero teaches itself how to win at Atari, chess, shogi, and Go". VentureBeat. November 20, 2019. Retrieved February 29, 2020.