Reinforcement Learning as a Rehearsal for Planning in Air Battle Management (RLAR)

Document Type


Publication Date



Computing Sciences and Computer Engineering


This project leveraged some of the recent advances in RL to develop planners for real time strategy games, specifically MicroRTS in lieu of Stratagem program's wargame. One of these advances from the PIs lab is called reinforcement learning as a rehearsal (RLaR). Previously, RLaR had only been evaluated in toy benchmark tasks to establish its efficacy in sample complexity reduction. This project developed RLaR for the actor-critic architecture and applied it for the first time to a complex domain with incomplete information such as MicroRTS. Another technique applied in this project originated from the recent successes of multi-agent learning in the complex StarCraft II game, specifically the architecture of multi-stage training that develop league and league-exploiter policies during intermediate stages for training robust policies.We trained RLaR against MicroPhantomthe runner-up from recent MicroRTS competitionsand showed its ability to plan effectively against this opponent but using fewer samples than relevant baselines. Separately, we trained RLaR in self-play using the 4-stage training scheme and evaluated the trained policy against MentalSeal (champion program) and MicroPhantom. While the policy once again showed good performance against MicroPhantom, it did not perform competently against MentalSeal. Based on an earlier preliminary finding that training against MentalSeal is extremely slow, we speculate that vastly more training time is required than what we could devote to this step during the extended period for this project.

Find in your library