Pruning for Monte Carlo Distributed Reinforcement Learning In Decentralized POMDPs
Document Type
Conference Proceeding
Publication Date
12-1-2013
School
Computing Sciences and Computer Engineering
Abstract
Decentralized partially observable Markov decision processes (Dec-POMDPs) offer a powerful modeling technique for realistic multi-agent coordination problems under uncertainty. Prevalent solution techniques are centralized and assume prior knowledge of the model. Recently a Monte Carlo based distributed reinforcement learning approach was proposed, where agents take turns to learn best responses to each other's policies. This promotes decentralization of the policy computation problem, and relaxes reliance on the full knowledge of the problem parameters. However, this Monte Carlo approach has a large sample complexity, which we address in this paper. In particular, we propose and analyze a modified version of the previous algorithm that adaptively eliminates parts of the experience tree from further exploration, thus requiring fewer samples while ensuring unchanged confidence in the learned value function. Experiments demonstrate significant reduction in sample complexity - the maximum reductions ranging from 61% to 91% over different benchmark Dec-POMDP problems - with the final policies being often better due to more focused exploration. Copyright © 2013, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
Publication Title
Proceedings of the 27th AAAI Conference on Artificial Intelligence, AAAI 2013
First Page
88
Last Page
94
Recommended Citation
Banerjee, B.
(2013). Pruning for Monte Carlo Distributed Reinforcement Learning In Decentralized POMDPs. Proceedings of the 27th AAAI Conference on Artificial Intelligence, AAAI 2013, 88-94.
Available at: https://aquila.usm.edu/fac_pubs/20262
COinS