Concurrent Reinforcement Learning As a Rehearsal for Decentralized Planning Under Uncertainty
Computing Sciences and Computer Engineering
Decentralized partially-observable Markov decision processes (Dec-POMDPs) are a powerful tool for modeling multi-agent planning and decision-making under uncertainty. Prevalent Dec-POMDP solution techniques require centralized computation given full knowledge of the underlying model. Reinforcement learning (RL) based approaches have been recently proposed for distributed solution of Dec-POMDPs without full prior knowledge of the model, but these methods assume that conditions during learning and policy execution are identical. This assumption may not always be necessary and may make learning difficult. We propose a novel RL approach in which agents rehearse with information that will not be available during policy execution, yet learn policies that do not explicitly rely on this information. We show experimentally that incorporating such information can ease the difficulties faced by non-rehearsal-based learners, and demonstrate fast, (near) optimal performance on many existing benchmark Dec-POMDP problems.
Proceedings of the 12th International Conference On Autonomous Agents and Multiagent Systems
(2013). Concurrent Reinforcement Learning As a Rehearsal for Decentralized Planning Under Uncertainty. Proceedings of the 12th International Conference On Autonomous Agents and Multiagent Systems.
Available at: https://aquila.usm.edu/fac_pubs/19354