Posts

The Problem with AlphaZero for Imperfect Information Games

Image
The ReBeL[1] Paper presents an example of the limitations of AlphaZero[2]-like algorithms in imperfect information games, borrowed from "Depth-Limited Solving for Imperfect-Information Games"[3]: ReBeL paper's example of how AlphaZero Fails In short [1]: "This [example] illustrates a critical challenge of imperfect-information games: unlike perfect-information games and single-agent settings, the value of an action may depend on the probability it is chosen." This is a valuable example, but there is actually an even simpler example of how AlphaZero breaks down in the imperfect information context, even without an adversarial opponent. Let us consider the following environment: Step 1. The agent is randomly assigned to the Heads or Tails world (50% probability), but this information is hidden from the agent. Step 2. The agent is given a chance to take a gamble.  If they choose not to, the episode ends immediately with a reward of 0. Step 3. If the agent chose t