Expectiminimax

Last updated
Expectiminimax
Class Search algorithm
Worst-case performance , where is the number of distinct dice throws
Best-case performance , in case all dice throws are known in advance

The expectiminimax algorithm is a variation of the minimax algorithm, for use in artificial intelligence systems that play two-player zero-sum games, such as backgammon, in which the outcome depends on a combination of the player's skill and chance elements such as dice rolls. In addition to "min" and "max" nodes of the traditional minimax tree, this variant has "chance" ("move by nature") nodes, which take the expected value of a random event occurring. [1] In game theory terms, an expectiminimax tree is the game tree of an extensive-form game of perfect, but incomplete information.

Contents

In the traditional minimax method, the levels of the tree alternate from max to min until the depth limit of the tree has been reached. In an expectiminimax tree, the "chance" nodes are interleaved with the max and min nodes. Instead of taking the max or min of the utility values of their children, chance nodes take a weighted average, with the weight being the probability that child is reached. [1]

The interleaving depends on the game. Each "turn" of the game is evaluated as a "max" node (representing the AI player's turn), a "min" node (representing a potentially-optimal opponent's turn), or a "chance" node (representing a random effect or player). [1]

For example, consider a game in which each round consists of a single die throw, and then decisions made by first the AI player, and then another intelligent opponent. The order of nodes in this game would alternate between "chance", "max" and then "min". [1]

Pseudocode

The expectiminimax algorithm is a variant of the minimax algorithm and was firstly proposed by Donald Michie in 1966. [2] Its pseudocode is given below.

function expectiminimax(node, depth)     if node is a terminal node or depth = 0         return the heuristic value of node     if the adversary is to play at node         // Return value of minimum-valued child node         let α := +∞         foreach child of node             α := min(α, expectiminimax(child, depth-1))     else if we are to play at node         // Return value of maximum-valued child node         let α := -∞         foreach child of node             α := max(α, expectiminimax(child, depth-1))     else if random event at node         // Return weighted average of all child nodes' values         let α := 0         foreach child of node             α := α + (Probability[child] × expectiminimax(child, depth-1))     return α

Note that for random nodes, there must be a known probability of reaching each child. (For most games of chance, child nodes will be equally-weighted, which means the return value can simply be the average of all child values.)

Expectimax search is a variant described in Universal Artificial Intelligence: Sequential Decisions Based on Algorithmic Probability (2005) by Tom Everitt and Marcus Hutter.

Alpha-beta pruning

Bruce Ballard was the first to develop a technique, called *-minimax, that enables alpha-beta pruning in expectiminimax trees. [3] [4] The problem with integrating alpha-beta pruning into the expectiminimax algorithm is that the scores of a chance node's children may exceed the alpha or beta bound of its parent, even if the weighted value of each child does not. However, it is possible to bound the scores of a chance node's children, and therefore bound the score of the CHANCE node.

If a standard iterative search is about to score the th child of a chance node with equally likely children, that search has computed scores for child nodes 1 through . Assuming a lowest possible score and a highest possible score for each unsearched child, the bounds of the chance node's score is as follows:

If an alpha and/or beta bound are given in scoring the chance node, these bounds can be used to cut off the search of the th child. The above equations can be rearranged to find a new alpha & beta value that will cut off the search if it would cause the chance node to exceed its own alpha and beta bounds:

The pseudocode for extending expectiminimax with fail-hard alpha-beta pruning in this manner is as follows:

function *-minimax(node, depth, α, β)     if node is a terminal node or depth = 0         return the heuristic value of node     if node is a max or min node         return the minimax value of the node     let N = numSuccessors(node)     // Compute α, β for children     let A = N * (α - U) + U     let B = N * (β - L) + L     let sum = 0     foreach child of node         // Limit child α, β to a valid range         let AX = max(A, L)         let BX = min(B, U)         // Search the child with new cutoff values         let score = *-minimax(child, depth - 1, AX, BX)         // Check for α, β cutoff conditions         if score <= A             return α         if score >= B             return β         sum += score         // Adjust α, β for the next child         A += U - v         B += L - v     // No cutoff occurred, return score     return sum / N

This technique is one of a family of variants of algorithms which can bound the search of a CHANCE node and its children based on collecting lower and upper bounds of the children during search. Other techniques which can offer performance benefits include probing each child with a heuristic to establish a min or max before performing a full search on each child, etc.

See also

Related Research Articles

Minmax is a decision rule used in artificial intelligence, decision theory, game theory, statistics, and philosophy for minimizing the possible loss for a worst case scenario. When dealing with gains, it is referred to as "maximin" – to maximize the minimum gain. Originally formulated for several-player zero-sum game theory, covering both the cases where players take alternate moves and those where they make simultaneous moves, it has also been extended to more complex games and to general decision-making in the presence of uncertainty.

In statistics, a statistic is sufficient with respect to a statistical model and its associated unknown parameter if "no other statistic that can be calculated from the same sample provides any additional information as to the value of the parameter". In particular, a statistic is sufficient for a family of probability distributions if the sample from which it is calculated gives no additional information than the statistic, as to which of those probability distributions is the sampling distribution.

Alpha–beta pruning is a search algorithm that seeks to decrease the number of nodes that are evaluated by the minimax algorithm in its search tree. It is an adversarial search algorithm used commonly for machine playing of two-player combinatorial games. It stops evaluating a move when at least one possibility has been found that proves the move to be worse than a previously examined move. Such moves need not be evaluated further. When applied to a standard minimax tree, it returns the same move as minimax would, but prunes away branches that cannot possibly influence the final decision.

<span class="mw-page-title-main">Gumbel distribution</span> Particular case of the generalized extreme value distribution

In probability theory and statistics, the Gumbel distribution is used to model the distribution of the maximum of a number of samples of various distributions.

In computer science, iterative deepening search or more specifically iterative deepening depth-first search is a state space/graph search strategy in which a depth-limited version of depth-first search is run repeatedly with increasing depth limits until the goal is found. IDDFS is optimal, meaning that it finds the shallowest goal. Since it visits all the nodes in the search tree down to depth before visiting any nodes at depth , the cumulative order in which nodes are first visited is effectively the same as in breadth-first search. However, IDDFS uses much less memory.

In linear algebra and functional analysis, the min-max theorem, or variational theorem, or Courant–Fischer–Weyl min-max principle, is a result that gives a variational characterization of eigenvalues of compact Hermitian operators on Hilbert spaces. It can be viewed as the starting point of many results of similar nature.

<span class="mw-page-title-main">Gauss–Newton algorithm</span> Mathematical algorithm

The Gauss–Newton algorithm is used to solve non-linear least squares problems, which is equivalent to minimizing a sum of squared function values. It is an extension of Newton's method for finding a minimum of a non-linear function. Since a sum of squares must be nonnegative, the algorithm can be viewed as using Newton's method to iteratively approximate zeroes of the components of the sum, and thus minimizing the sum. In this sense, the algorithm is also an effective method for solving overdetermined systems of equations. It has the advantage that second derivatives, which can be challenging to compute, are not required.

In the field of mathematics, norms are defined for elements within a vector space. Specifically, when the vector space comprises matrices, such norms are referred to as matrix norms. Matrix norms differ from vector norms in that they must also interact with matrix multiplication.

Negamax search is a variant form of minimax search that relies on the zero-sum property of a two-player game.

In mathematics, the resultant of two polynomials is a polynomial expression of their coefficients that is equal to zero if and only if the polynomials have a common root, or, equivalently, a common factor. In some older texts, the resultant is also called the eliminant.

Principal variation search is a negamax algorithm that can be faster than alpha–beta pruning. Like alpha–beta pruning, NegaScout is a directional search algorithm for computing the minimax value of a node in a tree. It dominates alpha–beta pruning in the sense that it will never examine a node that can be pruned by alpha–beta; however, it relies on accurate node ordering to capitalize on this advantage.

MTD(f) is an alpha-beta game tree search algorithm modified to use ‘zero-window’ initial search bounds, and memory (usually a transposition table) to reuse intermediate search results. MTD(f) is a shortened form of MTD(n,f) which stands for Memory-enhanced Test Driver with node ‘n’ and value ‘f’. The efficacy of this paradigm depends on a good initial guess, and the supposition that the final minimax value lies in a narrow window around the guess (which becomes an upper/lower bound for the search from root). The memory structure is used to save an initial guess determined elsewhere.

<span class="mw-page-title-main">Alpha max plus beta min algorithm</span> High-speed approximation of the square root of the sum of two squares

The alpha max plus beta min algorithm is a high-speed approximation of the square root of the sum of two squares. The square root of the sum of two squares, also known as Pythagorean addition, is a useful function, because it finds the hypotenuse of a right triangle given the two side lengths, the norm of a 2-D vector, or the magnitude of a complex number z = a + bi given the real and imaginary parts.

SSS* is a search algorithm, introduced by George Stockman in 1979, that conducts a state space search traversing a game tree in a best-first fashion similar to that of the A* search algorithm.

Collective operations are building blocks for interaction patterns, that are often used in SPMD algorithms in the parallel programming context. Hence, there is an interest in efficient realizations of these operations.

In computer science, the range query problem consists of efficiently answering several queries regarding a given interval of elements within an array. For example, a common task, known as range minimum query, is finding the smallest value inside a given range within a list of numbers.

In coding theory, list decoding is an alternative to unique decoding of error-correcting codes in the presence of many errors. If a code has relative distance , then it is possible in principle to recover an encoded message when up to fraction of the codeword symbols are corrupted. But when error rate is greater than , this will not in general be possible. List decoding overcomes that issue by allowing the decoder to output a short list of messages that might have been encoded. List decoding can correct more than fraction of errors.

In mathematics, the mean (topological) dimension of a topological dynamical system is a non-negative extended real number that is a measure of the complexity of the system. Mean dimension was first introduced in 1999 by Gromov. Shortly after it was developed and studied systematically by Lindenstrauss and Weiss. In particular they proved the following key fact: a system with finite topological entropy has zero mean dimension. For various topological dynamical systems with infinite topological entropy, the mean dimension can be calculated or at least bounded from below and above. This allows mean dimension to be used to distinguish between systems with infinite topological entropy. Mean dimension is also related to the problem of embedding topological dynamical systems in shift spaces.

Best node search (BNS), originally known as fuzzified game tree search, is a minimax search algorithm, developed in 2011. The idea is that the knowledge that one subtree is relatively better than some other(s) may be propagated sooner than the absolute value of minimax for that subtree. Then a repetitive search narrows until a particular node is shown to be relatively best.

Graph cut optimization is a combinatorial optimization method applicable to a family of functions of discrete variables, named after the concept of cut in the theory of flow networks. Thanks to the max-flow min-cut theorem, determining the minimum cut over a graph representing a flow network is equivalent to computing the maximum flow over the network. Given a pseudo-Boolean function , if it is possible to construct a flow network with positive weights such that

References

  1. 1 2 3 4 Russell, Stuart Jonathan; Norvig, Peter; Davis, Ernest (2010). Artificial Intelligence: A Modern Approach. Prentice Hall. pp. 177–178. ISBN   978-0-13-604259-4.
  2. Michie, D. (1966). "Game-Playing and Game-Learning Automata". Advances in Programming and Non-Numerical Computation. pp. 183–200. doi:10.1016/B978-0-08-011356-2.50011-2. ISBN   978-0-08-011356-2.
  3. Ballard, Bruce W. (September 1983). "The *-minimax search procedure for trees containing chance nodes". Artificial Intelligence. 21 (3): 327–350. doi:10.1016/S0004-3702(83)80015-0.
  4. Hauk, Thomas; Buro, Michael; Schaeffer, Jonathan (2006). "Rediscovering *-Minimax Search". Computers and Games. Lecture Notes in Computer Science. Vol. 3846. pp. 35–50. doi:10.1007/11674399_3. ISBN   978-3-540-32488-1.