Faculty Publications

Cooperative-Competitive Reinforcement Learning With History-Dependent Rewards

Keyang He, University of GeorgiaFollow
Bikramjit Banerjee, University of Southern MississippiFollow
Prashant Doshi, University of GeorgiaFollow

Document Type

Conference Proceeding

Publication Date

1-1-2021

School

Computing Sciences and Computer Engineering

Abstract

Consider a typical organization whose worker agents seek to collectively cooperate for its general betterment. However, each individual agent simultaneously seeks to act to secure a larger chunk than its co-workers of the annual increment in compensation, which usually comes from a fixed pot. As such, the agents in an organization must cooperate and compete. Another feature of many organizations is that a worker receives a bonus, which is often a fraction of previous year’s total profit. As such, the agent derives a reward that is also partly dependent on historical performance. How should the individual agent decide to act in this context? Few methods for the mixed cooperative-competitive setting have been presented in recent years, but these are challenged by problem domains whose reward functions additionally depend on historical information. Recent deep multi-agent reinforcement learning (MARL) methods using long short-term memory (LSTM) may be used, but these adopt a joint perspective to the interaction or require explicit exchange of information among the agents to promote cooperation, which may not be possible under competition. In this paper, we first show that the agent’s decision-making problem can be modeled as an interactive partially observable Markov decision process (I-POMDP) that captures the dynamic of a history-dependent reward. We present an interactive advantage actor-critic method (IA2C+), which combines the independent advantage actor-critic network with a belief filter that maintains a belief distribution over other agents’ models. Empirical results show that IA2C+ learns the optimal policy faster and more robustly than several baselines.

Publication Title

Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS

Volume

First Page

602

Last Page

610

Recommended Citation

He, K., Banerjee, B., Doshi, P. (2021). Cooperative-Competitive Reinforcement Learning With History-Dependent Rewards. Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS, 2, 602-610.
Available at: https://aquila.usm.edu/fac_pubs/19297

Link to Full Text

COinS

Faculty Publications

Cooperative-Competitive Reinforcement Learning With History-Dependent Rewards

Document Type

Publication Date

School

Abstract

Publication Title

Volume

First Page

Last Page

Recommended Citation

Search

Browse

Author Corner

Faculty Publications

Cooperative-Competitive Reinforcement Learning With History-Dependent Rewards

Authors

Document Type

Publication Date

School

Abstract

Publication Title

Volume

First Page

Last Page

Recommended Citation

Share

Search

Browse

Author Corner