The action R ends the game with payoff (5,2 while L continues the game, offering Player 2 choice between l and r (and the corresponding rewards).
Thus, imperfect information makes a crucial difference in the decision-making process.
Thus, imperfect information makes a crucial difference in the decision-making process. Such games are called zero-sum. An extensive form game, like poker, consists of multiple turns. Libratus solves the blueprint using counterfactual regret minimization (CFR), an iterative, linear time algorithm that solves for Nash equilibria in extensive form games. John Nash, Nobel laureate, and one of the most important figures of game theory.
Also in 2015, DeepMind's, alphaGo used similar deep reinforcement learning techniques to beat professionals at Go for the first time in history.
Solving the blueprint The blueprint is orders of magnitude smaller than the possible number of states in a game.
A normal form game, for our purposes, we will start with the normal form definition of a game.
In this two-player game, each player has a collection of available actions: A_1 for Player 1, and A_2 for Player. Note that Player 1 cannot distinguish which node they are. Thus, is hard to say which one is the optimal. That is, until Libratus came along.

Libratus eventually won by a staggering.7 big blinds per 100 hands, trouncing the worlds top poker professionals with.98 statistical significance.