Minimax Principle
The minimax principle is a decision rule used in decision theory, game theory, statistics, and philosophy for minimizing the possible loss for a worst case (maximum loss) scenario. When dealing with gains, it is referred to as “maximin”—to maximize the minimum gain. The principle is also known as the principle of pessimism, the principle of prudence, and the principle of maximum entropy.
Note: Good video about the minimax principle: Algorithms Explained – minimax and alpha-beta pruning by Sebastian Lague.
Two-Player Zero-Sum Games
The games most commonly studied within AI are what game theorists call deterministic, two-player, turn-taking, perfect information, zero-sum games.
- Deterministic - the outcome of a game is determined by the actions of the players;
- Perfect information - each player knows the complete state of the game at any point in time;
- Zero-sum - the total payoff of all players is zero, meaning that whats good for one player is bad for the other.
During this chapter:
- We will call the two players MAX and MIN;
- MAX moves first, and then MIN moves, and so on until the game is over;
- At the end of the game, points are awarded to the winner and penalties are awarded to the loser.
A game can be formally defined with the following elements:
S0- the initial state of the game;TO-MOVE(s)- the player who has the move in states;ACTIONS(s)- the set of legal moves in states;RESULT(s, a)- the state that results from doing moveain states;IS-TERMINAL(s)- a predicate that is true ifsis a terminal state;UTILITY(s, p)- the utility of statesto playerp; defines the payoff for playerpin states.
Minimax Tree
A minimax tree is a tree that represents all possible outcomes of a game, assuming that both players play optimally.
- The root of the tree is the initial state of the game;
- The MAX player is represented by the maximizing nodes;
- The MIN player is represented by the minimizing nodes;
In the image bellow, we can see that the MAX player always chooses the maximum value, and the MIN player always chooses the minimum value:
In this example:
- The MAX chooses the A1 node, because it has the maximum value;
- Then, the MIN chooses the A11 node, because it has the minimum value;
- …
Minimax Value
The minimax value is the utility of being in that state, assuming that both players play optimally.
MINIMAX(s) =
if IS-TERMINAL(s) then
return UTILITY(s, p)
else if TO-MOVE(s) = MAX then
return MAX(MINIMAX(RESULT(s, a)))
else if TO-MOVE(s) = MIN then
return MIN(MINIMAX(RESULT(s, a)))
Alpha-Beta Pruning
Alpha-beta pruning is a search algorithm that seeks to decrease the number of nodes that are evaluated by the minimax algorithm in its search tree. It is an adversarial search algorithm used commonly for machine playing of two-player games (Tic-tac-toe, Chess, Go, etc.).
- The alpha-beta pruning algorithm is a modification of the minimax algorithm that allows us to prune the search tree, reducing the number of nodes that we need to evaluate;
- It keeps track of two values,
alphaandbeta, which represent the best possible score that the MAX player can guarantee and the best possible score that the MIN player can guarantee, respectively; - The
alphavalue represents the best choice that the MAX player can make at that point in the game, and thebetavalue represents the best choice that the MIN player can make at that point in the game.