Counterfactual Regret Minimization For Decentralized Planning
Computing Sciences and Computer Engineering
Regret minimization is an effective technique for almost surely producing Nash equilibrium policies in coordination games in the strategic form. Decentralized POMDPs offer a realistic model for sequential coordination problems, but they yield double exponential sized games in the strategic form. Recently, counterfacutal regret has offered a way to decompose total regret along a 9extensive form) game tree into components that can be individually controlled, such that minimizing all of them minimizes the total regret as well. However, a straightforward extension of this decomposition in decentralized POMDPs leads to a complexity exponential in both the joint action and joint observation spaces. We present a more tractable approach to regret minimization where the regret is decomposed along the nodes of agents' policy trees that yields a complexity exponential only in the joint observation space. We present an algorithm, REMIT, to minimize regret by this decomposition and prove that it converges to a Nash equilibrium policy in the limit. We also use a stronger convergence criterion with REMIT, such that if this criterion is met then the algorithm must output a Nash equilibrium policy in finite time. We found empiricially that in every benchmark problems that we tested, this criterion was indeed met and (near) optimal Nash equilibrium policies were achieved.
The Eighth Annual Workshop On Multiagent Sequential Decision-Making Under Uncertainty
(2013). Counterfactual Regret Minimization For Decentralized Planning. The Eighth Annual Workshop On Multiagent Sequential Decision-Making Under Uncertainty.
Available at: https://aquila.usm.edu/fac_pubs/20617