KataGo

Last updated
KataGo
Developer(s) David Wu
Initial release27 February 2019;5 years ago (2019-02-27)
Stable release
1.15.3 / 5 August 2024;48 days ago (2024-08-05)
Repository
Written in C++
Type Go software
License MIT License
Website katagotraining.org

KataGo is a free and open-source computer Go program, capable of defeating top-level human players. First released on 27 February 2019, it is developed by David Wu, [1] who also developed the Arimaa playing program bot_Sharp which defeated three top human players to win the Arimaa AI Challenge in 2015. [2]

Contents

KataGo's first release was trained by David Wu using resources provided by his employer Jane Street Capital, [3] but it is now trained by a distributed effort. [4] Members of the computer Go community provide computing resources by running the client, which generates self-play games and rating games, and submits them to a server. The self-play games are used to train newer networks and the rating games to evaluate the networks' relative strengths.

KataGo supports the Go Text Protocol, with various extensions, [5] thus making it compatible with popular GUIs such as Lizzie. As an alternative, it also implements a custom "analysis engine" protocol, which is used by the KaTrain GUI, [6] among others. KataGo is widely used by strong human go players, including the South Korean national team, for training purposes. [7] [8] KataGo is also used as the default analysis engine in the online Go website AI Sensei, [9] as well as OGS (the Online Go Server ). [10]

Technology

Based on techniques used by DeepMind's AlphaGo Zero, KataGo implements Monte Carlo tree search with a convolutional neural network providing position evaluation and policy guidance. Compared to AlphaGo, KataGo introduces many refinements that enable it to learn faster and play more strongly. [1] [11] Notable features of KataGo that are absent in many other Go-playing programs include score estimation; support for small boards, arbitrary values of komi, and handicaps; and the ability to use various Go rulesets and adjust its play and evaluation for the small differences between them.

Network

The network used in KataGo are ResNets with pre-activation.

While AlphaGo Zero has only game board history as input features (as it was designed as a general architecture for board games, subsequently becoming AlphaZero), the input to the network contains additional features designed by hand specifically for playing Go. These features include liberties, komi parity, pass-alive, and ladders.

The trunk is essentially the same as in AlphaGo Zero, but with global pooling layers modules added to allow the network to be internally condition on global context such as ko fights. This is similar to Squeeze-and-Excitation Network. [1]

The network has two heads: a policy head and a value head. The policy and value heads are mostly the same as in AlphaGo Zero, but both heads have auxiliary subheads to provide auxiliary loss signal for faster training:

The network is described in detail in Appendix A of the report [1] .

The code base switched from using TensorFlow to PyTorch in version 1.12.

Training

Let its trunk have residual blocks and channels. During its first training run, multiple networks were trained with increasing . It took 19 days using a maximum of 28 Nvidia V100 GPUs at 4.2 million games.

After the first training run, training became a distributed project run by volunteers, with increasing network sizes. As of August 2024, it has reached b28c512 (28 blocks, 512 channels).

Adversarial attacks

In 2022, KataGo was used as the target for adversarial attack research, designed to demonstrate the "surprising failure modes" of AI systems. The researchers were able to trick KataGo into ending the game prematurely. [12] [13]

Adversarial training improves defense against adversarial attacks, though not perfectly. [14] [15]

Related Research Articles

<span class="mw-page-title-main">Deep Blue (chess computer)</span> Chess-playing computer made by IBM

Deep Blue was a chess-playing expert system run on a unique purpose-built IBM supercomputer. It was the first computer to win a game, and the first to win a match, against a reigning world champion under regular time controls. Development began in 1985 at Carnegie Mellon University under the name ChipTest. It then moved to IBM, where it was first renamed Deep Thought, then again in 1989 to Deep Blue. It first played world champion Garry Kasparov in a six-game match in 1996, where it lost four games to two. It was upgraded in 1997 and in a six-game re-match, it defeated Kasparov by winning two games and drawing three. Deep Blue's victory is considered a milestone in the history of artificial intelligence and has been the subject of several books and films.

<span class="mw-page-title-main">Evaluation function</span> Function in a computer game-playing program that evaluates a game position

An evaluation function, also known as a heuristic evaluation function or static evaluation function, is a function used by game-playing computer programs to estimate the value or goodness of a position in a game tree. Most of the time, the value is either a real number or a quantized integer, often in nths of the value of a playing piece such as a stone in go or a pawn in chess, where n may be tenths, hundredths or other convenient fraction, but sometimes, the value is an array of three values in the unit interval, representing the win, draw, and loss percentages of the position.

Arimaa is a two-player strategy board game that was designed to be playable with a standard chess set and difficult for computers while still being easy to learn and fun to play for humans. It was invented between 1997 and 2002 by Omar Syed, an Indian-American computer engineer trained in artificial intelligence. Syed was inspired by Garry Kasparov's defeat at the hands of the chess computer Deep Blue to design a new game which could be played with a standard chess set, would be difficult for computers to play well, but would have rules simple enough for his then four-year-old son Aamir to understand.

In the game of Go, a ladder,(征子) is a basic sequence of moves in which an attacker pursues a group in atari in a zig-zag pattern across the board. If there are no intervening stones, the group will hit the edge of the board and be captured.

<span class="mw-page-title-main">Anti-computer tactics</span> Human methods against game-playing computers

Anti-computer tactics are methods used by humans to try to beat computer opponents at various games, most typically board games such as chess and Arimaa. They are most associated with competitions against computer AIs that are playing to their utmost to win, rather than AIs merely programmed to be an interesting challenge that can be given intentional weaknesses and quirks by the programmer. Such tactics are most associated with the era when AIs searched a game tree with an evaluation function looking for promising moves, often with Alpha–beta pruning or other minimax algorithms used to narrow the search. Against such algorithms, a common tactic is to play conservatively aiming for a long-term advantage. The theory is that this advantage will manifest slowly enough that the computer is unable to notice in its search, and the computer won't play around the threat correctly. This may result in, for example, a subtle advantage that eventually turns into a winning chess endgame with a passed pawn.

Computer Arimaa refers to the playing of the board game Arimaa by computer programs.

In computer science, Monte Carlo tree search (MCTS) is a heuristic search algorithm for some kinds of decision processes, most notably those employed in software that plays board games. In that context MCTS is used to solve the game tree.

Google DeepMind Technologies Limited is a British-American artificial intelligence research laboratory which serves as a subsidiary of Google. Founded in the UK in 2010, it was acquired by Google in 2014 and merged with Google AI's Google Brain division to become Google DeepMind in April 2023. The company is based in London, with research centres in Canada, France, Germany, and the United States.

<span class="mw-page-title-main">Generative adversarial network</span> Deep learning method

A generative adversarial network (GAN) is a class of machine learning frameworks and a prominent framework for approaching generative AI. The concept was initially developed by Ian Goodfellow and his colleagues in June 2014. In a GAN, two neural networks contest with each other in the form of a zero-sum game, where one agent's gain is another agent's loss.

AlphaGo Zero is a version of DeepMind's Go software AlphaGo. AlphaGo's team published an article in the journal Nature on 19 October 2017, introducing AlphaGo Zero, a version created without using data from human games, and stronger than any previous version. By playing games against itself, AlphaGo Zero surpassed the strength of AlphaGo Lee in three days by winning 100 games to 0, reached the level of AlphaGo Master in 21 days, and exceeded all the old versions in 40 days.

Leela Zero is a free and open-source computer Go program released on 25 October 2017. It is developed by Belgian programmer Gian-Carlo Pascutto, the author of chess engine Sjeng and Go engine Leela.

<span class="mw-page-title-main">AlphaZero</span> Game-playing artificial intelligence

AlphaZero is a computer program developed by artificial intelligence research company DeepMind to master the games of chess, shogi and go. This algorithm uses an approach similar to AlphaGo Zero.

<span class="mw-page-title-main">Leela Chess Zero</span> Deep neural network-based chess engine

Leela Chess Zero is a free, open-source, and deep neural network–based chess engine and volunteer computing project. Development has been spearheaded by programmer Gary Linscott, who is also a developer for the Stockfish chess engine. Leela Chess Zero was adapted from the Leela Zero Go engine, which in turn was based on Google's AlphaGo Zero project. One of the purposes of Leela Chess Zero was to verify the methods in the AlphaZero paper as applied to the game of chess.

<span class="mw-page-title-main">StyleGAN</span> Novel generative adversarial network

StyleGAN is a generative adversarial network (GAN) introduced by Nvidia researchers in December 2018, and made source available in February 2019.

Deep reinforcement learning is a subfield of machine learning that combines reinforcement learning (RL) and deep learning. RL considers the problem of a computational agent learning to make decisions by trial and error. Deep RL incorporates deep learning into the solution, allowing agents to make decisions from unstructured input data without manual engineering of the state space. Deep RL algorithms are able to take in very large inputs and decide what actions to perform to optimize an objective. Deep reinforcement learning has been used for a diverse set of applications including but not limited to robotics, video games, natural language processing, computer vision, education, transportation, finance and healthcare.

Artificial intelligence and machine learning techniques are used in video games for a wide variety of applications such as non-player character (NPC) control and procedural content generation (PCG). Machine learning is a subset of artificial intelligence that uses historical data to build predictive and analytical models. This is in sharp contrast to traditional methods of artificial intelligence such as search trees and expert systems.

<span class="mw-page-title-main">Multi-agent reinforcement learning</span> Sub-field of reinforcement learning

Multi-agent reinforcement learning (MARL) is a sub-field of reinforcement learning. It focuses on studying the behavior of multiple learning agents that coexist in a shared environment. Each agent is motivated by its own rewards, and does actions to advance its own interests; in some environments these interests are opposed to the interests of other agents, resulting in complex group dynamics.

An energy-based model (EBM) (also called a Canonical Ensemble Learning(CEL) or Learning via Canonical Ensemble (LCE)) is an application of canonical ensemble formulation of statistical physics for learning from data problems. The approach prominently appears in generative models (GMs).

<span class="mw-page-title-main">MuZero</span> Game-playing artificial intelligence

MuZero is a computer program developed by artificial intelligence research company DeepMind to master games without knowing their rules. Its release in 2019 included benchmarks of its performance in go, chess, shogi, and a standard suite of Atari games. The algorithm uses an approach similar to AlphaZero. It matched AlphaZero's performance in chess and shogi, improved on its performance in Go, and improved on the state of the art in mastering a suite of 57 Atari games, a visually-complex domain.

Self-play is a technique for improving the performance of reinforcement learning agents. Intuitively, agents learn to improve their performance by playing "against themselves".

References

  1. 1 2 3 4 David Wu (27 February 2019). "Accelerating Self-Play Learning in Go". arXiv: 1902.10565 [cs.LG].
  2. Wu, David J. (2015-01-01). "Designing a Winning Arimaa Program". ICGA Journal. 38 (1): 19–40. doi:10.3233/ICG-2015-38104. ISSN   1389-6911.
  3. David Wu (28 February 2019). "Accelerating Self-Play Learning in Go" (blog post). Retrieved 2 November 2022.
  4. "KataGo Distributed Training" . Retrieved 2 November 2022.
  5. "KataGo GTP Extensions". GitHub . Retrieved 3 January 2023.
  6. "KaTrain". GitHub (Github repo). Retrieved 3 January 2023.
  7. 金雷 (1 March 2021). "AI当道 中国围棋优势缩小了吗?" [With the dominance of AI, is China's Go superiority shrinking?]. Xinmin Evening News. Retrieved 5 December 2021.
  8. Hong-ryeol Lee (14 April 2020). "'AI 기사' 격전장에 괴물 '블랙홀'이 등장했다" [A monster 'black hole' appeared in the battlefield of 'AI Go players']. The Chosun Ilbo. Retrieved 8 December 2021.
  9. "AI Sensai FAQ" . Retrieved 2 November 2022.
  10. Anoek (OGS developer) (31 March 2022). "Considering removing Leela Zero from our supported AI Reviewers" . Retrieved 2 November 2022.
  11. David Wu (15 November 2020). "Other Methods Implemented in KataGo". GitHub . Retrieved 4 November 2022.
  12. Benj Edwards (7 November 2022). "New Go-playing trick defeats world-class Go AI, but loses to human amateurs" . Retrieved 8 November 2022.
  13. Wang, Tony Tong; Gleave, Adam; Tseng, Tom; Pelrine, Kellin; Belrose, Nora; Miller, Joseph; Dennis, Michael D.; Duan, Yawen; Pogrebniak, Viktor; Levine, Sergey; Russell, Stuart (2023-07-03). "Adversarial Policies Beat Superhuman Go AIs". Proceedings of the 40th International Conference on Machine Learning. PMLR: 35655–35739.
  14. Hutson, Matthew (2024-07-08). "Can AI be superhuman? Flaws in top gaming bot cast doubt". Nature. doi:10.1038/d41586-024-02218-7.
  15. Tseng, Tom; McLean, Euan; Pelrine, Kellin; Wang, Tony T.; Gleave, Adam (2024-06-18), Can Go AIs be adversarially robust?, doi:10.48550/arXiv.2406.12843 , retrieved 2024-09-21