Pruning for Monte Carlo Distributed Reinforcement Learning In Decentralized POMDPs

Document Type

Conference Proceeding

Publication Date

12-1-2013

School

Computing Sciences and Computer Engineering

Abstract

Decentralized partially observable Markov decision processes (Dec-POMDPs) offer a powerful modeling technique for realistic multi-agent coordination problems under uncertainty. Prevalent solution techniques are centralized and assume prior knowledge of the model. Recently a Monte Carlo based distributed reinforcement learning approach was proposed, where agents take turns to learn best responses to each other's policies. This promotes decentralization of the policy computation problem, and relaxes reliance on the full knowledge of the problem parameters. However, this Monte Carlo approach has a large sample complexity, which we address in this paper. In particular, we propose and analyze a modified version of the previous algorithm that adaptively eliminates parts of the experience tree from further exploration, thus requiring fewer samples while ensuring unchanged confidence in the learned value function. Experiments demonstrate significant reduction in sample complexity - the maximum reductions ranging from 61% to 91% over different benchmark Dec-POMDP problems - with the final policies being often better due to more focused exploration. Copyright © 2013, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.

Publication Title

Proceedings of the 27th AAAI Conference on Artificial Intelligence, AAAI 2013

First Page

88

Last Page

94

Share

COinS