Rehearsal Based Multi-Agent Reinforcement Learning of Decentralized Plans
Computing Sciences and Computer Engineering
Decentralized partially-observable Markov decision processes (Dec-POMDPs) are a powerful tool for modeling multi-agent planning and decision-making under uncertainty. Prevalent Dec-POMDP solution techniques require centralized computation given full knowledge of the underlying model. Reinforcement learning (RL) based approaches have been recently proposed for distributed solution of Dec-POMDPs without full prior knowledge of the model, but these methods assume that conditions during learning and policy execution are identical. In practical scenarios this may not necessarily be the case, and agents may have difficulty learning under unnecessary constraints. We propose a novel RL approach in which agents are allowed to rehearse with information that will not be available during policy execution. The key is for the agents to learn policies that do no explicitly rely on this information. We show experimentally that incorporating such information can ease the difficulties faced by non-rehearsal-based learners, and demonstrate fast, (near) optimal performance on many existing benchmark Dec-POMDP problems. We also propose a new benchmark that is less abstract than existing problems and is designed to be particularly challenging to RL-based solvers, as a target for current and future research on RL solutions to Dec-POMDPs.
12th International Conference On Autonomous Agents and Mutliagent Systems
(2013). Rehearsal Based Multi-Agent Reinforcement Learning of Decentralized Plans. 12th International Conference On Autonomous Agents and Mutliagent Systems, 24-31.
Available at: https://aquila.usm.edu/fac_pubs/17156