Poker has always been considered the quintessential imperfect-information game – imperfect-information meaning that no one player has full knowledge of the other players’ positions. While we as computer scientists have made great strides in developing AI to play perfect-information games, like chess and checkers, it is only recently that we’ve been able to crack the world of imperfect-information.

In 2017, the DeepStack AI was revealed to the world, boasting the ability to beat professional human players at poker’s most-popular variant: Texas hold’em. More specifically, the AI was designed to play the heads-up no-limit variant of Texas hold’em, which essentially means the game had no cap on pot size and was restricted to 2 players. Interestingly, since the game involves 2 players and imperfect-information, it becomes a prime candidate of being analyzed with game theory.
Game theory can help us, for instance, determine how often we should be value-betting vs. bluffing when we’re on a given range.
Let’s break that terminology down a bit. Texas hold’em poker consists of 4 betting rounds, and the actions you take in each round [check, call, raise, fold] can give away how strong your hand could be. The likely range of hands you could be holding based on these actions is known as your range.
Now consider the bet on the river (after all community card have been revealed) in a situation where you’ve pegged your opponent on a very tight range. At this point, you know for sure whether you have the winning or losing hand – but either one could still let you win the pot. If you have the winning hand, you want to value-bet: raising just enough so that your opponent calls and you win the pot. Contrarily, if you have the losing hand, you want to bluff: raising enough to intimidate your opponent into folding, so you still win the pot.
With all this in mind, consider the following scenario:
- The pot is sitting at $100
- The river card has just been revealed
- The opponent has placed you in some range R
- You’ve placed your opponent in a very narrow range
- Action starts with you
Here, you have 2 ‘games’ to play out, depending on if you’ve determined you hold the winning or losing hand.
This is the game you play if you have the winning hand.
Player B | |||
Call | Fold | ||
Player A | ValueBet | $200 | $100 |
Fold | $0 | $0 |
This is the game you play if you have the losing hand.
Player B | |||
Call | Fold | ||
Player A | Bluff | -$100 | $100 |
Fold | $0 | $0 |
Note that these games only have 1 payoff value. Only Player A’s payoffs are shown in these tables because Player B is playing an entirely different game – this is due to the fact that Action in poker is sequential, not instantaneous. This means that Player B can, and should, react to Player A’s choice before making their own. Player B only has 1 game to play, shown below:
Player A | |||
VB | Bl | ||
Player B | C | -$100 | $200 |
F | $0 | $0 |
Notice how there is no option for Player A to fold; the game above can be played instantaneously (since B does not know whether A value-betted or bluffed), but if A had folded, B would know and would make $100.
Now, since A cannot control which game they are playing, they have to play in such a manner that they maximize their profits regardless of what B plays. Since B only has a choice when A doesn’t fold, A has to determine a ratio to ValueBet:Bluff (folding when needed to maintain this ratio). We can find this ratio by finding q in the Mixed-Nash Equilibrium in B’s game (as seen in lecture).
It turns out the q = 2/3, meaning that A should value-bet twice as often as they bluff if they want B to be indifferent about their choice. We know B will be indifferent because this all occurs when they believe A is in range R, so they have no other info to work with. We can now find the expected payoff for A: if B chooses to call, then A gets 2/3(200) + 1/3(-100) = $100, and if B chooses to fold, then A gets 2/3(100) + 1/3(100) = $100. Therefore, by using game theory, Player A can guarantee a payoff of $100 in range R.
Obviously, this was a very simple and isolated example, and DeepStack employs many more complicated heuristics to produce its results. But, it is insightful to see how just the basics of game theory can be directly applied to better your poker play. GTO [Game Theory Optimized] poker is a growing trend in the scene, and for good reason. After all – if you can’t beat them, join them.
References
- Corrigan, Rory. “How You Should Think About Poker (But Probably Don’t).” Upswing Poker, 8 Dec. 2017.
- Webpage found at: https://upswingpoker.com/gto-poker-game-theory-optimal-strategy/
- Moravčík, Matej & Schmid, Martin & Burch, Neil & Lisý, Viliam & Morrill, Dustin & Bard, Nolan & Davis, Trevor & Waugh, Kevin & Johanson, Michael & Bowling, Michael. (2017). DeepStack: Expert-Level Artificial Intelligence in No-Limit Poker. Science. 356. 10.1126/science.aam6960.
- Companion Webpage found at: https://www.deepstack.ai