Faculty Publications

Distributed Reinforcement Learning for Policy Synchronization In Infinite-Horizon Dec-POMDPs

Bikramjit Banerjee, University of Southern MississippiFollow
Landon Kraemer, University of Southern MississippiFollow

Document Type

Article

Publication Date

1-1-2012

School

Computing Sciences and Computer Engineering

Abstract

In many multi-agent tasks, agents face uncertainty about the environment, the outcomes of their actions, and the behaviors of other agents. Dec-POMDPs offer a powerful modeling framework for sequential, cooperative, multiagent tasks under uncertainty. Solution techniques for infinite-horizon Dec-POMDPs have assumed prior knowledge of the model and have required centralized solvers. We propose a method for learning Dec-POMDP solutions in a distributed fashion. We identify the issue of policy synchronization that distributed learners face and propose incorporating rewards into their learned model representations to ameliorate it. Most importantly, we show that even if rewards are not visible to agents during policy execution, exploiting the information contained in reward signals during learning is still beneficial.

Recommended Citation

Banerjee, B., Kraemer, L. (2012). Distributed Reinforcement Learning for Policy Synchronization In Infinite-Horizon Dec-POMDPs. .
Available at: https://aquila.usm.edu/fac_pubs/17157

Link to Full Text

Find in your library

COinS

Faculty Publications

Distributed Reinforcement Learning for Policy Synchronization In Infinite-Horizon Dec-POMDPs

Document Type

Publication Date

School

Abstract

Recommended Citation

Search

Browse

Author Corner

Faculty Publications

Distributed Reinforcement Learning for Policy Synchronization In Infinite-Horizon Dec-POMDPs

Authors

Document Type

Publication Date

School

Abstract

Recommended Citation

Share

Search

Browse

Author Corner