Minimax (sometimes MinMax, MMor saddle point ) is a decision rule used in artificial intelligence, decision theory, game theory, statistics, and philosophy for minimizing the possible loss for a worst case (maximum loss) scenario. When dealing with gains, it is referred to as "maximin"—to maximize the minimum gain. Originally formulated for two-player zero-sum game theory, covering both the cases where players take alternate moves and those where they make simultaneous moves, it has also been extended to more complex games and to general decision-making in the presence of uncertainty.
The maximin value is the highest value that the player can be sure to get without knowing the actions of the other players; equivalently, it is the lowest value the other players can force the player to receive when they know the player's action. Its formal definition is:
Calculating the maximin value of a player is done in a worst-case approach: for each possible action of the player, we check all possible actions of the other players and determine the worst possible combination of actions—the one that gives player i the smallest value. Then, we determine which action player i can take in order to make sure that this smallest value is the highest possible.
For example, consider the following game for two players, where the first player ("row player") may choose any of three moves, labelled T, M, or B, and the second player ("column" player) may choose either of two moves, L or R. The result of the combination of both moves is expressed in a payoff table:
(where the first number in each cell is the pay-out of the row player and the second number is the pay-out of the column player).
For the sake of example, we consider only pure strategies. Check each player in turn:
If both players play their respective maximin strategies , the payoff vector is .
The minimax value of a player is the smallest value that the other players can force the player to receive, without knowing the player's actions; equivalently, it is the largest value the player can be sure to get when they know the actions of the other players. Its formal definition is:
The definition is very similar to that of the maximin value—only the order of the maximum and minimum operators is inverse. In the above example:
For every player i, the maximin is at most the minimax:
Intuitively, in maximin the maximization comes before the minimization, so player i tries to maximize their value before knowing what the others will do; in minimax the maximization comes after the minimization, so player i is in a much better position—they maximize their value knowing what the others did.
Another way to understand the notation is by reading from right to left: when we write
the initial set of outcomes depends on both and . We first marginalize away from , by maximizing over (for every possible value of ) to yield a set of marginal outcomes , which depends only on . We then minimize over over these outcomes. (Conversely for maximin.)
Although it is always the case that and , the payoff vector resulting from both players playing their minimax strategies, in the case of or in the case of , cannot similarly be ranked against the payoff vector resulting from both players playing their maximin strategy.
In two-player zero-sum games, the minimax solution is the same as the Nash equilibrium.
In the context of zero-sum games, the minimax theorem is equivalent to: [ failed verification ]
For every two-person, zero-sum game with finitely many strategies, there exists a value V and a mixed strategy for each player, such that
- (a) Given player 2's strategy, the best payoff possible for player 1 is V, and
- (b) Given player 1's strategy, the best payoff possible for player 2 is −V.
Equivalently, Player 1's strategy guarantees them a payoff of V regardless of Player 2's strategy, and similarly Player 2 can guarantee themselves a payoff of −V. The name minimax arises because each player minimizes the maximum payoff possible for the other—since the game is zero-sum, they also minimize their own maximum loss (i.e. maximize their minimum payoff). See also example of a game without a value.
|B chooses B1||B chooses B2||B chooses B3|
|A chooses A1||+3||−2||+2|
|A chooses A2||−1||0||+4|
|A chooses A3||−4||−3||+1|
The following example of a zero-sum game, where A and B make simultaneous moves, illustrates minimax solutions. Suppose each player has three choices and consider the payoff matrix for A displayed on the right. Assume the payoff matrix for B is the same matrix with the signs reversed (i.e. if the choices are A1 and B1 then B pays 3 to A). Then, the minimax choice for A is A2 since the worst possible result is then having to pay 1, while the simple minimax choice for B is B2 since the worst possible result is then no payment. However, this solution is not stable, since if B believes A will choose A2 then B will choose B1 to gain 1; then if A believes B will choose B1 then A will choose A1 to gain 3; and then B will choose B2; and eventually both players will realize the difficulty of making a choice. So a more stable strategy is needed.
Some choices are dominated by others and can be eliminated: A will not choose A3 since either A1 or A2 will produce a better result, no matter what B chooses; B will not choose B3 since some mixtures of B1 and B2 will produce a better result, no matter what A chooses.
A can avoid having to make an expected payment of more than 1∕3 by choosing A1 with probability 1∕6 and A2 with probability 5∕6: The expected payoff for A would be 3 × (1∕6) − 1 × (5∕6) = −1∕3 in case B chose B1 and −2 × (1∕6) + 0 × (5∕6) = −1/3 in case B chose B2. Similarly, B can ensure an expected gain of at least 1/3, no matter what A chooses, by using a randomized strategy of choosing B1 with probability 1∕3 and B2 with probability 2∕3. These mixed minimax strategies are now stable and cannot be improved.
Frequently, in game theory, maximin is distinct from minimax. Minimax is used in zero-sum games to denote minimizing the opponent's maximum payoff. In a zero-sum game, this is identical to minimizing one's own maximum loss, and to maximizing one's own minimum gain.
"Maximin" is a term commonly used for non-zero-sum games to describe the strategy which maximizes one's own minimum payoff. In non-zero-sum games, this is not generally the same as minimizing the opponent's maximum gain, nor the same as the Nash equilibrium strategy.
The minimax values are very important in the theory of repeated games. One of the central theorems in this theory, the folk theorem, relies on the minimax values.
In combinatorial game theory, there is a minimax algorithm for game solutions.
A simple version of the minimax algorithm, stated below, deals with games such as tic-tac-toe, where each player can win, lose, or draw. If player A can win in one move, their best move is that winning move. If player B knows that one move will lead to the situation where player A can win in one move, while another move will lead to the situation where player A can, at best, draw, then player B's best move is the one leading to a draw. Late in the game, it's easy to see what the "best" move is. The Minimax algorithm helps find the best move, by working backwards from the end of the game. At each step it assumes that player A is trying to maximize the chances of A winning, while on the next turn player B is trying to minimize the chances of A winning (i.e., to maximize B's own chances of winning).
A minimax algorithm 's turn to move, A gives a value to each of their legal moves.is a recursive algorithm for choosing the next move in an n-player game, usually a two-player game. A value is associated with each position or state of the game. This value is computed by means of a position evaluation function and it indicates how good it would be for a player to reach that position. The player then makes the move that maximizes the minimum value of the position resulting from the opponent's possible following moves. If it is A
A possible allocation method consists in assigning a certain win for A as +1 and for B as −1. This leads to combinatorial game theory as developed by John Horton Conway. An alternative is using a rule that if the result of a move is an immediate win for A it is assigned positive infinity and if it is an immediate win for B, negative infinity. The value to A of any other move is the maximum of the values resulting from each of B's possible replies. For this reason, A is called the maximizing player and B is called the minimizing player, hence the name minimax algorithm. The above algorithm will assign a value of positive or negative infinity to any position since the value of every position will be the value of some final winning or losing position. Often this is generally only possible at the very end of complicated games such as chess or go, since it is not computationally feasible to look ahead as far as the completion of the game, except towards the end, and instead, positions are given finite values as estimates of the degree of belief that they will lead to a win for one player or another.
This can be extended if we can supply a heuristic evaluation function which gives values to non-final game states without considering all possible following complete sequences. We can then limit the minimax algorithm to look only at a certain number of moves ahead. This number is called the "look-ahead", measured in "plies". For example, the chess computer Deep Blue (the first one to beat a reigning world champion, Garry Kasparov at that time) looked ahead at least 12 plies, then applied a heuristic evaluation function.
The algorithm can be thought of as exploring the nodes of a game tree . The effective branching factor of the tree is the average number of children of each node (i.e., the average number of legal moves in a position). The number of nodes to be explored usually increases exponentially with the number of plies (it is less than exponential if evaluating forced moves or repeated positions). The number of nodes to be explored for the analysis of a game is therefore approximately the branching factor raised to the power of the number of plies. It is therefore impractical to completely analyze games such as chess using the minimax algorithm.
The performance of the naïve minimax algorithm may be improved dramatically, without affecting the result, by the use of alpha-beta pruning. Other heuristic pruning methods can also be used, but not all of them are guaranteed to give the same result as the un-pruned search.
A naïve minimax algorithm may be trivially modified to additionally return an entire Principal Variation along with a minimax score.
The pseudocode for the depth limited minimax algorithm is given below.
function minimax(node, depth, maximizingPlayer) isif depth = 0 or node is a terminal node thenreturn the heuristic value of node if maximizingPlayer then value := −∞ for each child of node do value := max(value, minimax(child, depth − 1, FALSE)) return value else(* minimizing player *) value := +∞ for each child of node do value := min(value, minimax(child, depth − 1, TRUE)) return value
(* Initial call *) minimax(origin, depth, TRUE)
The minimax function returns a heuristic value for leaf nodes (terminal nodes and nodes at the maximum search depth). Non leaf nodes inherit their value from a descendant leaf node. The heuristic value is a score measuring the favorability of the node for the maximizing player. Hence nodes resulting in a favorable outcome, such as a win, for the maximizing player have higher scores than nodes more favorable for the minimizing player. The heuristic value for terminal (game ending) leaf nodes are scores corresponding to win, loss, or draw, for the maximizing player. For non terminal leaf nodes at the maximum search depth, an evaluation function estimates a heuristic value for the node. The quality of this estimate and the search depth determine the quality and accuracy of the final minimax result.
Minimax treats the two players (the maximizing player and the minimizing player) separately in its code. Based on the observation that , minimax may often be simplified into the negamax algorithm.
Suppose the game being played only has a maximum of two possible moves per player each turn. The algorithm generates the tree on the right, where the circles represent the moves of the player running the algorithm (maximizing player), and squares represent the moves of the opponent (minimizing player). Because of the limitation of computation resources, as explained above, the tree is limited to a look-ahead of 4 moves.
The algorithm evaluates each leaf node using a heuristic evaluation function, obtaining the values shown. The moves where the maximizing player wins are assigned with positive infinity, while the moves that lead to a win of the minimizing player are assigned with negative infinity. At level 3, the algorithm will choose, for each node, the smallest of the child node values, and assign it to that same node (e.g. the node on the left will choose the minimum between "10" and "+∞", therefore assigning the value "10" to itself). The next step, in level 2, consists of choosing for each node the largest of the child node values. Once again, the values are assigned to each parent node . The algorithm continues evaluating the maximum and minimum values of the child nodes alternately until it reaches the root node , where it chooses the move with the largest value (represented in the figure with a blue arrow). This is the move that the player should make in order to minimize the maximum possible loss.
Minimax theory has been extended to decisions where there is no other player, but where the consequences of decisions depend on unknown facts. For example, deciding to prospect for minerals entails a cost which will be wasted if the minerals are not present, but will bring major rewards if they are. One approach is to treat this as a game against nature (see move by nature), and using a similar mindset as Murphy's law or resistentialism, take an approach which minimizes the maximum expected loss, using the same techniques as in the two-person zero-sum games.
In addition, expectiminimax trees have been developed, for two-player games in which chance (for example, dice) is a factor.
In classical statistical decision theory, we have an estimator that is used to estimate a parameter . We also assume a risk function , usually specified as the integral of a loss function. In this framework, is called minimax if it satisfies
An alternative criterion in the decision theoretic framework is the Bayes estimator in the presence of a prior distribution . An estimator is Bayes if it minimizes the average risk
A key feature of minimax decision making is being non-probabilistic: in contrast to decisions using expected value or expected utility, it makes no assumptions about the probabilities of various outcomes, just scenario analysis of what the possible outcomes are. It is thus robust to changes in the assumptions, as these other decision techniques are not. Various extensions of this non-probabilistic approach exist, notably minimax regret and Info-gap decision theory.
Further, minimax only requires ordinal measurement (that outcomes be compared and ranked), not interval measurements (that outcomes include "how much better or worse"), and returns ordinal data, using only the modeled outcomes: the conclusion of a minimax analysis is: "this strategy is minimax, as the worst case is (outcome), which is less bad than any other strategy". Compare to expected value analysis, whose conclusion is of the form: "this strategy yields E(X)=n." Minimax thus can be used on ordinal data, and can be more transparent.
In philosophy, the term "maximin" is often used in the context of John Rawls's A Theory of Justice, where he refers to it (Rawls 1971, p. 152) in the context of The Difference Principle. Rawls defined this principle as the rule which states that social and economic inequalities should be arranged so that "they are to be of the greatest benefit to the least-advantaged members of society".
During the 1997 match, the software search extended the search to about 40 plies along the forcing lines, even though the nonextended search reached only about 12 plies.
|Look up minimax in Wiktionary, the free dictionary.|
|Wikiquote has quotations related to: Minimax|
In game theory and economic theory, a zero-sum game is a mathematical representation of a situation in which each participant's gain or loss of utility is exactly balanced by the losses or gains of the utility of the other participants. If the total gains of the participants are added up and the total losses are subtracted, they will sum to zero. Thus, cutting a cake, where taking a larger piece reduces the amount of cake available for others as much as it increases the amount available for that taker, is a zero-sum game if all participants value each unit of cake equally.
In game theory, the Nash equilibrium, named after the mathematician John Forbes Nash Jr., is a proposed solution of a non-cooperative game involving two or more players in which each player is assumed to know the equilibrium strategies of the other players, and no player has anything to gain by changing only their own strategy.
Dijkstra's algorithm is an algorithm for finding the shortest paths between nodes in a graph, which may represent, for example, road networks. It was conceived by computer scientist Edsger W. Dijkstra in 1956 and published three years later.
Pattern recognition is the automated recognition of patterns and regularities in data. It has applications in statistical data analysis, signal processing, image analysis, information retrieval, bioinformatics, data compression, computer graphics and machine learning. Pattern recognition has its origins in statistics and engineering; some modern approaches to pattern recognition include the use of machine learning, due to the increased availability of big data and a new abundance of processing power. However, these activities can be viewed as two facets of the same field of application, and together they have undergone substantial development over the past few decades. A modern definition of pattern recognition is:
The field of pattern recognition is concerned with the automatic discovery of regularities in data through the use of computer algorithms and with the use of these regularities to take actions such as classifying the data into different categories.
In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of a probability distribution by maximizing a likelihood function, so that under the assumed statistical model the observed data is most probable. The point in the parameter space that maximizes the likelihood function is called the maximum likelihood estimate. The logic of maximum likelihood is both intuitive and flexible, and as such the method has become a dominant means of statistical inference.
Alpha–beta pruning is a search algorithm that seeks to decrease the number of nodes that are evaluated by the minimax algorithm in its search tree. It is an adversarial search algorithm used commonly for machine playing of two-player games. It stops evaluating a move when at least one possibility has been found that proves the move to be worse than a previously examined move. Such moves need not be evaluated further. When applied to a standard minimax tree, it returns the same move as minimax would, but prunes away branches that cannot possibly influence the final decision.
A Bayesian network, Bayes network, belief network, decision network, Bayes(ian) model or probabilistic directed acyclic graphical model is a probabilistic graphical model that represents a set of variables and their conditional dependencies via a directed acyclic graph (DAG). Bayesian networks are ideal for taking an event that occurred and predicting the likelihood that any one of several possible known causes was the contributing factor. For example, a Bayesian network could represent the probabilistic relationships between diseases and symptoms. Given symptoms, the network can be used to compute the probabilities of the presence of various diseases.
In mathematical optimization and decision theory, a loss function or cost function is a function that maps an event or values of one or more variables onto a real number intuitively representing some "cost" associated with the event. An optimization problem seeks to minimize a loss function. An objective function is either a loss function or its negative, in which case it is to be maximized.
In statistics, an expectation–maximization (EM) algorithm is an iterative method to find maximum likelihood or maximum a posteriori (MAP) estimates of parameters in statistical models, where the model depends on unobserved latent variables. The EM iteration alternates between performing an expectation (E) step, which creates a function for the expectation of the log-likelihood evaluated using the current estimate for the parameters, and a maximization (M) step, which computes parameters maximizing the expected log-likelihood found on the E step. These parameter-estimates are then used to determine the distribution of the latent variables in the next E step.
In game theory, a cooperative game is a game with competition between groups of players ("coalitions") due to the possibility of external enforcement of cooperative behavior. Those are opposed to non-cooperative games in which there is either no possibility to forge alliances or all agreements need to be self-enforcing.
Mechanism design is a field in economics and game theory that takes an objectives-first approach to designing economic mechanisms or incentives, toward desired objectives, in strategic settings, where players act rationally. Because it starts at the end of the game, then goes backwards, it is also called reverse game theory. It has broad applications, from economics and politics to networked-systems.
An extensive-form game is a specification of a game in game theory, allowing for the explicit representation of a number of key aspects, like the sequencing of players' possible moves, their choices at every decision point, the information each player has about the other player's moves when they make a decision, and their payoffs for all possible game outcomes. Extensive-form games also allow for the representation of incomplete information in the form of chance events modeled as "moves by nature".
Negamax search is a variant form of minimax search that relies on the zero-sum property of a two-player game.
In game theory, folk theorems are a class of theorems about possible Nash equilibrium payoff profiles in repeated games. The original Folk Theorem concerned the payoffs of all the Nash equilibria of an infinitely repeated game. This result was called the Folk Theorem because it was widely known among game theorists in the 1950s, even though no one had published it. Friedman's (1971) Theorem concerns the payoffs of certain subgame-perfect Nash equilibria (SPE) of an infinitely repeated game, and so strengthens the original Folk Theorem by using a stronger equilibrium concept: subgame-perfect Nash equilibria rather than Nash equilibria.
In game theory, a subgame perfect equilibrium is a refinement of a Nash equilibrium used in dynamic games. A strategy profile is a subgame perfect equilibrium if it represents a Nash equilibrium of every subgame of the original game. Informally, this means that if the players played any smaller game that consisted of only one part of the larger game, their behavior would represent a Nash equilibrium of that smaller game. Every finite extensive game with perfect recall has a subgame perfect equilibrium.
In statistical decision theory, where we are faced with the problem of estimating a deterministic parameter (vector) from observations an estimator is called minimax if its maximal risk is minimal among all estimators of . In a sense this means that is an estimator which performs best in the worst possible case allowed in the problem.
In the mathematical theory of games, in particular the study of zero-sum continuous games, not every game has a minimax value. This is the expected value to one of the players when both play a perfect strategy.
In decision theory and game theory, Wald's maximin model is a non-probabilistic decision-making model according to which decisions are ranked on the basis of their worst-case outcomes – the optimal decision is one with the least worst outcome. It is one of the most important models in robust decision making in general and robust optimization in particular.
Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists in choosing the action that maximizes the expected reward with respect to a randomly drawn belief.