Monte Carlo tree search

Last updated
Monte Carlo tree search
MCTS Algorithm.png
Class Search algorithm

In computer science, Monte Carlo tree search (MCTS) is a heuristic search algorithm for some kinds of decision processes, most notably those employed in software that plays board games. In that context MCTS is used to solve the game tree.

Contents

MCTS was combined with neural networks in 2016 [1] and has been used in multiple board games like Chess, Shogi, [2] Checkers, Backgammon, Contract Bridge, Go, Scrabble, and Clobber [3] as well as in turn-based-strategy video games (such as Total War: Rome II's implementation in the high level campaign AI [4] ) and applications outside of games. [5]

History

Monte Carlo method

The Monte Carlo method, which uses random sampling for deterministic problems which are difficult or impossible to solve using other approaches, dates back to the 1940s. [6] In his 1987 PhD thesis, Bruce Abramson combined minimax search with an expected-outcome model based on random game playouts to the end, instead of the usual static evaluation function. Abramson said the expected-outcome model "is shown to be precise, accurate, easily estimable, efficiently calculable, and domain-independent." [7] He experimented in-depth with tic-tac-toe and then with machine-generated evaluation functions for Othello and chess.

Such methods were then explored and successfully applied to heuristic search in the field of automated theorem proving by W. Ertel, J. Schumann and C. Suttner in 1989, [8] [9] [10] thus improving the exponential search times of uninformed search algorithms such as e.g. breadth-first search, depth-first search or iterative deepening.

In 1992, B. Brügmann employed it for the first time in a Go-playing program. [11] In 2002, Chang et al. [12] proposed the idea of "recursive rolling out and backtracking" with "adaptive" sampling choices in their Adaptive Multi-stage Sampling (AMS) algorithm for the model of Markov decision processes. AMS was the first work to explore the idea of UCB-based exploration and exploitation in constructing sampled/simulated (Monte Carlo) trees and was the main seed for UCT (Upper Confidence Trees). [13]

Monte Carlo tree search (MCTS)

The rating of best Go-playing programs on the KGS server since 2007. Since 2006, all the best programs use Monte Carlo tree search. Computer-go-ratings-English.svg
The rating of best Go-playing programs on the KGS server since 2007. Since 2006, all the best programs use Monte Carlo tree search.

In 2006, inspired by these predecessors, [15] Rémi Coulom described the application of the Monte Carlo method to game-tree search and coined the name Monte Carlo tree search, [16] L. Kocsis and Cs. Szepesvári developed the UCT (Upper Confidence bounds applied to Trees) algorithm, [17] and S. Gelly et al. implemented UCT in their program MoGo. [18] In 2008, MoGo achieved dan (master) level in 9×9 Go, [19] and the Fuego program began to win against strong amateur players in 9×9 Go. [20]

In January 2012, the Zen program won 3:1 in a Go match on a 19×19 board with an amateur 2 dan player. [21] Google Deepmind developed the program AlphaGo, which in October 2015 became the first Computer Go program to beat a professional human Go player without handicaps on a full-sized 19x19 board. [1] [22] [23] In March 2016, AlphaGo was awarded an honorary 9-dan (master) level in 19×19 Go for defeating Lee Sedol in a five-game match with a final score of four games to one. [24] AlphaGo represents a significant improvement over previous Go programs as well as a milestone in machine learning as it uses Monte Carlo tree search with artificial neural networks (a deep learning method) for policy (move selection) and value, giving it efficiency far surpassing previous programs. [25]

MCTS algorithm has also been used in programs that play other board games (for example Hex, [26] Havannah, [27] Game of the Amazons, [28] and Arimaa [29] ), real-time video games (for instance Ms. Pac-Man [30] [31] and Fable Legends [32] ), and nondeterministic games (such as skat, [33] poker, [34] Magic: The Gathering, [35] or Settlers of Catan [36] ).

Principle of operation

The focus of MCTS is on the analysis of the most promising moves, expanding the search tree based on random sampling of the search space. The application of Monte Carlo tree search in games is based on many playouts, also called roll-outs. In each playout, the game is played out to the very end by selecting moves at random. The final game result of each playout is then used to weight the nodes in the game tree so that better nodes are more likely to be chosen in future playouts.

The most basic way to use playouts is to apply the same number of playouts after each legal move of the current player, then choose the move which led to the most victories. [11] The efficiency of this method—called Pure Monte Carlo Game Search—often increases with time as more playouts are assigned to the moves that have frequently resulted in the current player's victory according to previous playouts. Each round of Monte Carlo tree search consists of four steps: [37]

Step of Monte Carlo tree search. MCTS-steps.svg
Step of Monte Carlo tree search.

This graph shows the steps involved in one decision, with each node showing the ratio of wins to total playouts from that point in the game tree for the player that the node represents. [38] In the Selection diagram, black is about to move. The root node shows there are 11 wins out of 21 playouts for white from this position so far. It complements the total of 10/21 black wins shown along the three black nodes under it, each of which represents a possible black move. Note that this graph does not follow the UCT algorithm described below.

If white loses the simulation, all nodes along the selection incremented their simulation count (the denominator), but among them only the black nodes were credited with wins (the numerator). If instead white wins, all nodes along the selection would still increment their simulation count, but among them only the white nodes would be credited with wins. In games where draws are possible, a draw causes the numerator for both black and white to be incremented by 0.5 and the denominator by 1. This ensures that during selection, each player's choices expand towards the most promising moves for that player, which mirrors the goal of each player to maximize the value of their move.

Rounds of search are repeated as long as the time allotted to a move remains. Then the move with the most simulations made (i.e. the highest denominator) is chosen as the final answer.

This basic procedure can be applied to any game whose positions necessarily have a finite number of moves and finite length. For each position, all feasible moves are determined: k random games are played out to the very end, and the scores are recorded. The move leading to the best score is chosen. Ties are broken by fair coin flips. Pure Monte Carlo Game Search results in strong play in several games with random elements, as in the game EinStein würfelt nicht! . It converges to optimal play (as k tends to infinity) in board filling games with random turn order, for instance in the game Hex with random turn order. [39] DeepMind's AlphaZero replaces the simulation step with an evaluation based on a neural network. [2]

Exploration and exploitation

The main difficulty in selecting child nodes is maintaining some balance between the exploitation of deep variants after moves with high average win rate and the exploration of moves with few simulations. The first formula for balancing exploitation and exploration in games, called UCT (Upper Confidence Bound 1 applied to trees), was introduced by Levente Kocsis and Csaba Szepesvári. [17] UCT is based on the UCB1 formula derived by Auer, Cesa-Bianchi, and Fischer [40] and the probably convergent AMS (Adaptive Multi-stage Sampling) algorithm first applied to multi-stage decision-making models (specifically, Markov Decision Processes) by Chang, Fu, Hu, and Marcus. [12] Kocsis and Szepesvári recommend to choose in each node of the game tree the move for which the expression has the highest value. In this formula:

The first component of the formula above corresponds to exploitation; it is high for moves with high average win ratio. The second component corresponds to exploration; it is high for moves with few simulations.

Most contemporary implementations of Monte Carlo tree search are based on some variant of UCT that traces its roots back to the AMS simulation optimization algorithm for estimating the value function in finite-horizon Markov Decision Processes (MDPs) introduced by Chang et al. [12] (2005) in Operations Research. (AMS was the first work to explore the idea of UCB-based exploration and exploitation in constructing sampled/simulated (Monte Carlo) trees and was the main seed for UCT. [13] )

Advantages and disadvantages

Although it has been proven that the evaluation of moves in Monte Carlo tree search converges to minimax when using UCT, [17] [41] the basic version of Monte Carlo tree search converges only in so called "Monte Carlo Perfect" games. [42] However, Monte Carlo tree search does offer significant advantages over alpha–beta pruning and similar algorithms that minimize the search space.

In particular, pure Monte Carlo tree search does not need an explicit evaluation function. Simply implementing the game's mechanics is sufficient to explore the search space (i.e. the generating of allowed moves in a given position and the game-end conditions). As such, Monte Carlo tree search can be employed in games without a developed theory or in general game playing.

The game tree in Monte Carlo tree search grows asymmetrically as the method concentrates on the more promising subtrees. Thus[ dubious discuss ], it achieves better results than classical algorithms in games with a high branching factor.

A disadvantage is that in certain positions, there may be moves that look superficially strong, but that actually lead to a loss via a subtle line of play. Such "trap states" require thorough analysis to be handled correctly, particularly when playing against an expert player; however, MCTS may not "see" such lines due to its policy of selective node expansion. [43] [44] It is believed that this may have been part of the reason for AlphaGo's loss in its fourth game against Lee Sedol. In essence, the search attempts to prune sequences which are less relevant. In some cases, a play can lead to a very specific line of play which is significant, but which is overlooked when the tree is pruned, and this outcome is therefore "off the search radar". [45]

Improvements

Various modifications of the basic Monte Carlo tree search method have been proposed to shorten the search time. Some employ domain-specific expert knowledge, others do not.

Monte Carlo tree search can use either light or heavy playouts. Light playouts consist of random moves while heavy playouts apply various heuristics to influence the choice of moves. These heuristics may employ the results of previous playouts (e.g. the Last Good Reply heuristic [46] ) or expert knowledge of a given game. For instance, in many Go-playing programs certain stone patterns in a portion of the board influence the probability of moving into that area. [18] Paradoxically, playing suboptimally in simulations sometimes makes a Monte Carlo tree search program play stronger overall. [47]

Patterns of hane (surrounding opponent stones) used in playouts by the MoGo program. It is advantageous for both black and white to put a stone on the middle square, except the rightmost pattern where it favors black only. Mogo-hane.svg
Patterns of hane (surrounding opponent stones) used in playouts by the MoGo program. It is advantageous for both black and white to put a stone on the middle square, except the rightmost pattern where it favors black only.

Domain-specific knowledge may be employed when building the game tree to help the exploitation of some variants. One such method assigns nonzero priors to the number of won and played simulations when creating each child node, leading to artificially raised or lowered average win rates that cause the node to be chosen more or less frequently, respectively, in the selection step. [48] A related method, called progressive bias, consists in adding to the UCB1 formula a element, where bi is a heuristic score of the i-th move. [37]

The basic Monte Carlo tree search collects enough information to find the most promising moves only after many rounds; until then its moves are essentially random. This exploratory phase may be reduced significantly in a certain class of games using RAVE (Rapid Action Value Estimation). [48] In these games, permutations of a sequence of moves lead to the same position. Typically, they are board games in which a move involves placement of a piece or a stone on the board. In such games the value of each move is often only slightly influenced by other moves.

In RAVE, for a given game tree node N, its child nodes Ci store not only the statistics of wins in playouts started in node N but also the statistics of wins in all playouts started in node N and below it, if they contain move i (also when the move was played in the tree, between node N and a playout). This way the contents of tree nodes are influenced not only by moves played immediately in a given position but also by the same moves played later.

RAVE on the example of tic-tac-toe. In red nodes, the RAVE statistics will be updated after the b1-a2-b3 simulation. Tic-tac-toe-RAVE-English.svg
RAVE on the example of tic-tac-toe. In red nodes, the RAVE statistics will be updated after the b1-a2-b3 simulation.

When using RAVE, the selection step selects the node, for which the modified UCB1 formula has the highest value. In this formula, and stand for the number of won playouts containing move i and the number of all playouts containing move i, and the function should be close to one and to zero for relatively small and relatively big ni and , respectively. One of many formulas for , proposed by D. Silver, [49] says that in balanced positions one can take , where b is an empirically chosen constant.

Heuristics used in Monte Carlo tree search often require many parameters. There are automated methods to tune the parameters to maximize the win rate. [50]

Monte Carlo tree search can be concurrently executed by many threads or processes. There are several fundamentally different methods of its parallel execution: [51]

See also

Related Research Articles

Minimax is a decision rule used in artificial intelligence, decision theory, game theory, statistics, and philosophy for minimizing the possible loss for a worst case scenario. When dealing with gains, it is referred to as "maximin" – to maximize the minimum gain. Originally formulated for several-player zero-sum game theory, covering both the cases where players take alternate moves and those where they make simultaneous moves, it has also been extended to more complex games and to general decision-making in the presence of uncertainty.

<span class="mw-page-title-main">Monte Carlo method</span> Probabilistic problem-solving algorithm

Monte Carlo methods, or Monte Carlo experiments, are a broad class of computational algorithms that rely on repeated random sampling to obtain numerical results. The underlying concept is to use randomness to solve problems that might be deterministic in principle. The name comes from the Monte Carlo Casino in Monaco, where the primary developer of the method, mathematician Stanislaw Ulam, was inspired by his uncle's gambling habits.

<span class="mw-page-title-main">Computer chess</span> Computer hardware and software capable of playing chess

Computer chess includes both hardware and software capable of playing chess. Computer chess provides opportunities for players to practice even in the absence of human opponents, and also provides opportunities for analysis, entertainment and training. Computer chess applications that play at the level of a chess grandmaster or higher are available on hardware from supercomputers to smart phones. Standalone chess-playing machines are also available. Stockfish, Leela Chess Zero, GNU Chess, Fruit, and other free open source applications are available for various platforms.

Alpha–beta pruning is a search algorithm that seeks to decrease the number of nodes that are evaluated by the minimax algorithm in its search tree. It is an adversarial search algorithm used commonly for machine playing of two-player combinatorial games. It stops evaluating a move when at least one possibility has been found that proves the move to be worse than a previously examined move. Such moves need not be evaluated further. When applied to a standard minimax tree, it returns the same move as minimax would, but prunes away branches that cannot possibly influence the final decision.

<span class="mw-page-title-main">Computer Go</span> Field of artificial intelligence around Go computer programs

Computer Go is the field of artificial intelligence (AI) dedicated to creating a computer program that plays the traditional board game Go. The field is sharply divided into two eras. Before 2015, the programs of the era were weak. The best efforts of the 1980s and 1990s produced only AIs that could be defeated by beginners, and AIs of the early 2000s were intermediate level at best. Professionals could defeat these programs even given handicaps of 10+ stones in favor of the AI. Many of the algorithms such as alpha-beta minimax that performed well as AIs for checkers and chess fell apart on Go's 19x19 board, as there were too many branching possibilities to consider. Creation of a human professional quality program with the techniques and hardware of the time was out of reach. Some AI researchers speculated that the problem was unsolvable without creation of human-like AI.

In statistics, Markov chain Monte Carlo (MCMC) is a class of algorithms used to draw samples from a probability distribution. Given a probability distribution, one can construct a Markov chain whose elements' distribution approximates it – that is, the Markov chain's equilibrium distribution matches the target distribution. The more steps that are included, the more closely the distribution of the sample matches the actual desired distribution.

In the context of combinatorial game theory, which typically studies sequential games with perfect information, a game tree is a graph representing all possible game states within such a game. Such games include well-known ones such as chess, checkers, Go, and tic-tac-toe. This can be used to measure the complexity of a game, as it represents all the possible ways a game can pan out. Due to the large game trees of complex games such as chess, algorithms that are designed to play this class of games will use partial game trees, which makes computation feasible on modern computers. Various methods exist to solve game trees. If a complete game tree can be generated, a deterministic algorithm, such as backward induction or retrograde analysis can be used. Randomized algorithms and minmax algorithms such as MCTS can be used in cases where a complete game tree is not feasible.

Combinatorial game theory measures game complexity in several ways:

  1. State-space complexity,
  2. Game tree size,
  3. Decision complexity,
  4. Game-tree complexity,
  5. Computational complexity.

Maven is an artificial intelligence Scrabble player, created by Brian Sheppard in 1986. It is no longer commercially available, the rights having been bought by Hasbro in 1996. It has been used in official licensed Hasbro Scrabble games since then.

A randomized algorithm is an algorithm that employs a degree of randomness as part of its logic or procedure. The algorithm typically uses uniformly random bits as an auxiliary input to guide its behavior, in the hope of achieving good performance in the "average case" over all possible choices of random determined by the random bits; thus either the running time, or the output are random variables.

The expectiminimax algorithm is a variation of the minimax algorithm, for use in artificial intelligence systems that play two-player zero-sum games, such as backgammon, in which the outcome depends on a combination of the player's skill and chance elements such as dice rolls. In addition to "min" and "max" nodes of the traditional minimax tree, this variant has "chance" nodes, which take the expected value of a random event occurring. In game theory terms, an expectiminimax tree is the game tree of an extensive-form game of perfect, but incomplete information.

Random forests or random decision forests is an ensemble learning method for classification, regression and other tasks that works by creating a multitude of decision trees during training. For classification tasks, the output of the random forest is the class selected by most trees. For regression tasks, the output is the average of the predictions of the trees. Random forests correct for decision trees' habit of overfitting to their training set.

General game playing (GGP) is the design of artificial intelligence programs to be able to play more than one game successfully. For many games like chess, computers are programmed to play these games using a specially designed algorithm, which cannot be transferred to another context. For instance, a chess-playing computer program cannot play checkers. General game playing is considered as a necessary milestone on the way to artificial general intelligence.

<span class="mw-page-title-main">Anti-computer tactics</span> Human methods against game-playing computers

Anti-computer tactics are methods used by humans to try to beat computer opponents at various games, most typically board games such as chess and Arimaa. They are most associated with competitions against computer AIs that are playing to their utmost to win, rather than AIs merely programmed to be an interesting challenge that can be given intentional weaknesses and quirks by the programmer. Such tactics are most associated with the era when AIs searched a game tree with an evaluation function looking for promising moves, often with Alpha–beta pruning or other minimax algorithms used to narrow the search. Against such algorithms, a common tactic is to play conservatively aiming for a long-term advantage. The theory is that this advantage will manifest slowly enough that the computer is unable to notice in its search, and the computer won't play around the threat correctly. This may result in, for example, a subtle advantage that eventually turns into a winning chess endgame with a passed pawn.

Non-uniform random variate generation or pseudo-random number sampling is the numerical practice of generating pseudo-random numbers (PRN) that follow a given probability distribution. Methods are typically based on the availability of a uniformly distributed PRN generator. Computational algorithms are then used to manipulate a single random variate, X, or often several such variates, into a new random variate Y such that these values have the required distribution. The first methods were developed for Monte-Carlo simulations in the Manhattan project, published by John von Neumann in the early 1950s.

AlphaGo is a computer program that plays the board game Go. It was developed by the London-based DeepMind Technologies, an acquired subsidiary of Google. Subsequent versions of AlphaGo became increasingly powerful, including a version that competed under the name Master. After retiring from competitive play, AlphaGo Master was succeeded by an even more powerful version known as AlphaGo Zero, which was completely self-taught without learning from human games. AlphaGo Zero was then generalized into a program known as AlphaZero, which played additional games, including chess and shogi. AlphaZero has in turn been succeeded by a program known as MuZero which learns without being taught the rules.

Darkforest is a computer go program developed by Meta Platforms, based on deep learning techniques using a convolutional neural network. Its updated version Darkfores2 combines the techniques of its predecessor with Monte Carlo tree search. The MCTS effectively takes tree search methods commonly seen in computer chess programs and randomizes them. With the update, the system is known as Darkfmcts3.

Rémi Coulom is a French computer scientist, once an assistant professor of computer science at the Lille 3 University, and the developer of Crazy Stone, a computer Go program.

<span class="mw-page-title-main">Ultimate tic-tac-toe</span> Variant of tic-tac-toe game

Ultimate tic-tac-toe is a board game composed of nine tic-tac-toe boards arranged in a 3 × 3 grid. Players take turns playing on the smaller tic-tac-toe boards until one of them wins on the larger board. Compared to traditional tic-tac-toe, strategy in this game is conceptually more difficult and has proven more challenging for computers.

<span class="mw-page-title-main">AlphaZero</span> Game-playing artificial intelligence

AlphaZero is a computer program developed by artificial intelligence research company DeepMind to master the games of chess, shogi and go. This algorithm uses an approach similar to AlphaGo Zero.

References

  1. 1 2 Silver, David; Huang, Aja; Maddison, Chris J.; Guez, Arthur; Sifre, Laurent; Driessche, George van den; Schrittwieser, Julian; Antonoglou, Ioannis; Panneershelvam, Veda; Lanctot, Marc; Dieleman, Sander; Grewe, Dominik; Nham, John; Kalchbrenner, Nal; Sutskever, Ilya; Lillicrap, Timothy; Leach, Madeleine; Kavukcuoglu, Koray; Graepel, Thore; Hassabis, Demis (28 January 2016). "Mastering the game of Go with deep neural networks and tree search". Nature . 529 (7587): 484–489. Bibcode:2016Natur.529..484S. doi:10.1038/nature16961. ISSN   0028-0836. PMID   26819042. S2CID   515925. Closed Access logo transparent.svg
  2. 1 2 Silver, David (2017). "Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm". arXiv: 1712.01815v1 [cs.AI].
  3. Rajkumar, Prahalad. "A Survey of Monte-Carlo Techniques in Games" (PDF). cs.umd.edu. Archived (PDF) from the original on 2023-04-07.
  4. "Monte-Carlo Tree Search in TOTAL WAR: ROME II's Campaign AI". AI Game Dev. Archived from the original on 13 March 2017. Retrieved 25 February 2017.
  5. Kemmerling, Marco; Lütticke, Daniel; Schmitt, Robert H. (1 January 2024). "Beyond games: a systematic review of neural Monte Carlo tree search applications". Applied Intelligence. 54 (1): 1020–1046. arXiv: 2303.08060 . doi:10.1007/s10489-023-05240-w. ISSN   1573-7497.
  6. Nicholas, Metropolis; Stanislaw, Ulam (1949). "The monte carlo method". Journal of the American Statistical Association. 44 (247): 335–341. doi:10.1080/01621459.1949.10483310. PMID   18139350.
  7. Abramson, Bruce (1987). The Expected-Outcome Model of Two-Player Games (PDF). Technical report, Department of Computer Science, Columbia University. Retrieved 23 December 2013.
  8. Wolfgang Ertel; Johann Schumann; Christian Suttner (1989). "Learning Heuristics for a Theorem Prover using Back Propagation.". In J. Retti; K. Leidlmair (eds.). 5. Österreichische Artificial-Intelligence-Tagung. Informatik-Fachberichte 208, pp. 87-95. Springer. Archived from the original on 2021-04-15. Retrieved 2016-08-14.
  9. Christian Suttner; Wolfgang Ertel (1990). "Automatic Acquisition of Search Guiding Heuristics.". CADE90, 10th Int. Conf. on Automated Deduction.pp. 470-484. LNAI 449. Springer. Archived from the original on 2021-04-15. Retrieved 2016-08-14.
  10. Christian Suttner; Wolfgang Ertel (1991). "Using Back-Propagation Networks for Guiding the Search of a Theorem Prover". Journal of Neural Networks Research & Applications. 2 (1): 3–16. Archived from the original on 2021-04-15. Retrieved 2016-08-14.
  11. 1 2 Brügmann, Bernd (1993). Monte Carlo Go (PDF). Technical report, Department of Physics, Syracuse University.
  12. 1 2 3 Chang, Hyeong Soo; Fu, Michael C.; Hu, Jiaqiao; Marcus, Steven I. (2005). "An Adaptive Sampling Algorithm for Solving Markov Decision Processes" (PDF). Operations Research. 53: 126–139. doi:10.1287/opre.1040.0145. hdl: 1903/6264 . Archived from the original (PDF) on 2021-04-20. Retrieved 2016-02-25.
  13. 1 2 Hyeong Soo Chang; Michael Fu; Jiaqiao Hu; Steven I. Marcus (2016). "Google DeepMind's Alphago: O.R.'s unheralded role in the path-breaking achievement". OR/MS Today. 45 (5): 24–29.
  14. "Sensei's Library: KGSBotRatings" (PDF). Archived from the original on 2009-06-25. Retrieved 2012-05-03.
  15. Rémi Coulom (2008). "The Monte-Carlo Revolution in Go" (PDF). Japanese-French Frontiers of Science Symposium.
  16. Rémi Coulom (2007). "Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search". Computers and Games, 5th International Conference, CG 2006, Turin, Italy, May 29–31, 2006. Revised Papers. H. Jaap van den Herik, Paolo Ciancarini, H. H. L. M. Donkers (eds.). Springer. pp. 72–83. CiteSeerX   10.1.1.81.6817 . ISBN   978-3-540-75537-1.
  17. 1 2 3 Kocsis, Levente; Szepesvári, Csaba (2006). "Bandit based Monte-Carlo Planning". In Fürnkranz, Johannes; Scheffer, Tobias; Spiliopoulou, Myra (eds.). Machine Learning: ECML 2006, 17th European Conference on Machine Learning, Berlin, Germany, September 18–22, 2006, Proceedings. Lecture Notes in Computer Science. Vol. 4212. Springer. pp. 282–293. CiteSeerX   10.1.1.102.1296 . doi:10.1007/11871842_29. ISBN   3-540-45375-X.
  18. 1 2 3 Sylvain Gelly; Yizao Wang; Rémi Munos; Olivier Teytaud (November 2006). Modification of UCT with Patterns in Monte-Carlo Go (PDF). Technical report, INRIA.
  19. Chang-Shing Lee; Mei-Hui Wang; Guillaume Chaslot; Jean-Baptiste Hoock; Arpad Rimmel; Olivier Teytaud; Shang-Rong Tsai; Shun-Chin Hsu; Tzung-Pei Hong (2009). "The Computational Intelligence of MoGo Revealed in Taiwan's Computer Go Tournaments" (PDF). IEEE Transactions on Computational Intelligence and AI in Games. 1 (1): 73–89. CiteSeerX   10.1.1.470.6018 . doi:10.1109/tciaig.2009.2018703. S2CID   15266518.
  20. Markus Enzenberger; Martin Mūller (2008). Fuego – An Open-Source Framework for Board Games and Go Engine Based on Monte Carlo Tree Search (PDF). Technical report, University of Alberta.
  21. "The Shodan Go Bet" . Retrieved 2012-05-02.
  22. "Research Blog: AlphaGo: Mastering the ancient game of Go with Machine Learning". Google Research Blog. 27 January 2016.
  23. "Google achieves AI 'breakthrough' by beating Go champion". BBC News. 27 January 2016.
  24. "Match 1 - Google DeepMind Challenge Match: Lee Sedol vs AlphaGo". Youtube. 9 March 2016.
  25. "Google AlphaGo AI clean sweeps European Go champion". ZDNet. 28 January 2016.
  26. Broderick Arneson; Ryan Hayward; Philip Henderson (June 2009). "MoHex Wins Hex Tournament" (PDF). ICGA Journal . 32 (2): 114–116. doi:10.3233/ICG-2009-32218.
  27. Timo Ewalds (2011). Playing and Solving Havannah (PDF). Master's thesis, University of Alberta.
  28. Richard J. Lorentz (2008). "Amazons Discover Monte-Carlo". Computers and Games, 6th International Conference, CG 2008, Beijing, China, September 29 – October 1, 2008. Proceedings. H. Jaap van den Herik, Xinhe Xu, Zongmin Ma, Mark H. M. Winands (eds.). Springer. pp. 13–24. ISBN   978-3-540-87607-6.
  29. Tomáš Kozelek (2009). Methods of MCTS and the game Arimaa (PDF). Master's thesis, Charles University in Prague.
  30. Xiaocong Gan; Yun Bao; Zhangang Han (December 2011). "Real-Time Search Method in Nondeterministic Game – Ms. Pac-Man". ICGA Journal . 34 (4): 209–222. doi:10.3233/ICG-2011-34404.
  31. Tom Pepels; Mark H. M. Winands; Marc Lanctot (September 2014). "Real-Time Monte Carlo Tree Search in Ms Pac-Man". IEEE Transactions on Computational Intelligence and AI in Games. 6 (3): 245–257. doi: 10.1109/tciaig.2013.2291577 .
  32. Mountain, Gwaredd (2015). "Tactical Planning and Real-time MCTS in Fable Legends". Archived from the original on 2019-06-08. Retrieved 2019-06-08. .. we implemented a simulation based approach, which involved modelling the game play and using MCTS to search the potential plan space. Overall this worked well, ...
  33. Michael Buro; Jeffrey Richard Long; Timothy Furtak; Nathan R. Sturtevant (2009). "Improving State Evaluation, Inference, and Search in Trick-Based Card Games". IJCAI 2009, Proceedings of the 21st International Joint Conference on Artificial Intelligence, Pasadena, California, USA, July 11–17, 2009. Craig Boutilier (ed.). pp. 1407–1413. CiteSeerX   10.1.1.150.3077 .
  34. Jonathan Rubin; Ian Watson (April 2011). "Computer poker: A review". Artificial Intelligence. 175 (5–6): 958–987. doi: 10.1016/j.artint.2010.12.005 .
  35. C.D. Ward; P.I. Cowling (2009). "Monte Carlo Search Applied to Card Selection in Magic: The Gathering" (PDF). CIG'09 Proceedings of the 5th international conference on Computational Intelligence and Games. IEEE Press. Archived from the original (PDF) on 2016-05-28.
  36. István Szita; Guillaume Chaslot; Pieter Spronck (2010). "Monte-Carlo Tree Search in Settlers of Catan" (PDF). In Jaap Van Den Herik; Pieter Spronck (eds.). Advances in Computer Games, 12th International Conference, ACG 2009, Pamplona, Spain, May 11–13, 2009. Revised Papers. Springer. pp. 21–32. ISBN   978-3-642-12992-6. Archived from the original (PDF) on 2016-03-04. Retrieved 2015-11-30.
  37. 1 2 G.M.J.B. Chaslot; M.H.M. Winands; J.W.H.M. Uiterwijk; H.J. van den Herik; B. Bouzy (2008). "Progressive Strategies for Monte-Carlo Tree Search" (PDF). New Mathematics and Natural Computation. 4 (3): 343–359. doi:10.1142/s1793005708001094.
  38. Bradberry, Jeff (2015-09-07). "Introduction to Monte Carlo Tree Search".
  39. Peres, Yuval; Schramm, Oded; Sheffield, Scott; Wilson, David B. (2006). "Random-Turn Hex and other selection games". arXiv: math/0508580 .
  40. Auer, Peter; Cesa-Bianchi, Nicolò; Fischer, Paul (2002). "Finite-time Analysis of the Multiarmed Bandit Problem". Machine Learning. 47 (2/3): 235–256. doi: 10.1023/a:1013689704352 . S2CID   207609497.
  41. Browne, Cameron B.; Powley, Edward; Whitehouse, Daniel; Lucas, Simon M.; Cowling, Peter I.; Rohlfshagen, Philipp; Tavener, Stephen; Perez, Diego; Samothrakis, Spyridon; Colton, Simon (2012). "A Survey of Monte Carlo Tree Search Methods". IEEE Transactions on Computational Intelligence and AI in Games. 4 (1): 1–43. doi:10.1109/tciaig.2012.2186810. ISSN   1943-068X.
  42. Althöfer, Ingo (2012). "On Board-Filling Games with Random-Turn Order and Monte Carlo Perfectness". Advances in Computer Games. Lecture Notes in Computer Science. Vol. 7168. pp. 258–269. doi:10.1007/978-3-642-31866-5_22. ISBN   978-3-642-31865-8.
  43. Ramanujan, Raghuram; Sabharwal, Ashish; Selman, Bart (May 2010). "On adversarial search spaces and sampling-based planning". ICAPS '10: Proceedings of the Twentieth International Conference on International Conference on Automated Planning and Scheduling. Icaps'10: 242–245.
  44. Ramanujan, Raghuram; Selman, Bart (March 2011). "Trade-Offs in Sampling-Based Adversarial Planning". Proceedings of the International Conference on Automated Planning and Scheduling. 21 (1): 202–209. doi: 10.1609/icaps.v21i1.13472 . S2CID   45147586.
  45. "Lee Sedol defeats AlphaGo in masterful comeback - Game 4". Go Game Guru. Archived from the original on 2016-11-16. Retrieved 2017-07-04.
  46. Drake, Peter (December 2009). "The Last-Good-Reply Policy for Monte-Carlo Go". ICGA Journal. 32 (4): 221–227. doi:10.3233/ICG-2009-32404.
  47. Seth Pellegrino; Peter Drake (2010). "Investigating the Effects of Playout Strength in Monte-Carlo Go". Proceedings of the 2010 International Conference on Artificial Intelligence, ICAI 2010, July 12–15, 2010, Las Vegas Nevada, USA. Hamid R. Arabnia, David de la Fuente, Elena B. Kozerenko, José Angel Olivas, Rui Chang, Peter M. LaMonica, Raymond A. Liuzzi, Ashu M. G. Solo (eds.). CSREA Press. pp. 1015–1018. ISBN   978-1-60132-148-0.
  48. 1 2 Gelly, Sylvain; Silver, David (2007). "Combining Online and Offline Knowledge in UCT" (PDF). Machine Learning, Proceedings of the Twenty-Fourth International Conference (ICML 2007), Corvallis, Oregon, USA, June 20–24, 2007. Zoubin Ghahramani (ed.). ACM. pp. 273–280. ISBN   978-1-59593-793-3. Archived from the original (PDF) on 2017-08-28.
  49. David Silver (2009). Reinforcement Learning and Simulation-Based Search in Computer Go (PDF). PhD thesis, University of Alberta.
  50. Rémi Coulom. "CLOP: Confident Local Optimization for Noisy Black-Box Parameter Tuning". ACG 2011: Advances in Computer Games 13 Conference, Tilburg, the Netherlands, November 20–22.
  51. Guillaume M.J-B. Chaslot, Mark H.M. Winands, Jaap van den Herik (2008). "Parallel Monte-Carlo Tree Search" (PDF). Computers and Games, 6th International Conference, CG 2008, Beijing, China, September 29 – October 1, 2008. Proceedings. H. Jaap van den Herik, Xinhe Xu, Zongmin Ma, Mark H. M. Winands (eds.). Springer. pp. 60–71. ISBN   978-3-540-87607-6.{{cite book}}: CS1 maint: multiple names: authors list (link)
  52. Markus Enzenberger; Martin Müller (2010). "A Lock-free Multithreaded Monte-Carlo Tree Search Algorithm". In Jaap Van Den Herik; Pieter Spronck (eds.). Advances in Computer Games: 12th International Conference, ACG 2009, Pamplona, Spain, May 11–13, 2009, Revised Papers . Springer. pp.  14–20. CiteSeerX   10.1.1.161.1984 . ISBN   978-3-642-12992-6.

Bibliography