Date of Award
5-2025
Degree Type
Honors College Thesis
Academic Program
Computer Science BS
Department
Computing
First Advisor
Bikramjit Banerjee, Ph.D.
Advisor Department
Computing
Abstract
Goal-conditioned reinforcement learning (GCRL) serves as an extension of reinforce- ment learning (RL) that focuses on goals that can be adjusted, making it useful for many applications, especially in complex robotics tasks. Recent research has established that the optimal value function of GCRL, denoted as Q∗(s, a, g), has a quasipseudometric structure. This finding has led to the development of targeted neural architectures that respect such a structure. However, prior analyses have predominantly focused on sparse reward settings, which are known to increase challenges related to sample complexity. In this work, I with the guidance of my advisor show that the key property of a quasipseudometric - the triangle inequality - holds true even when using dense rewards. This finding challenges previous beliefs that dense rewards could be detrimental to GCRL performance. By identifying the crucial condition necessary for maintaining the triangle ineqaulity, we show that dense reward functions meeting this criterion can improve, rather than hinder, sample complexity. The implications of this research are important, especially in the current period of AI revolu- tionary change. By enabling the training of efficient neural architectures with dense rewards, we can leverage the benefits to improve sample efficiency in GCRL. We evaluate the pro- posal in 12 standard benchmark environments in GCRL featuring challenging continuous control tasks. Empirical results confirm that training a quasipseudometric value function in a dense reward setting indeed outperforms training with sparse rewards. The potential impact of this work lies in its ability to transform how GCRL is approached in practical applications, paving the way for more effective learning strategies in robotics and beyond. While there are certain limitations to this approach, researchers and practitioners, using this method, can expect improved performance and more efficient training processes, ultimately advancing the field of reinforcement learning.
Copyright
Copyright for this thesis is owned by the author. It may be freely accessed by all users. However, any reuse or reproduction not covered by the exceptions of the Fair Use or Educational Use clauses of U.S. Copyright Law or without permission of the copyright holder may be a violation of federal law. Contact the administrator if you have additional questions.
Recommended Citation
Valieva, Khadichabonu, "Quasipseudometric Value Functions with Dense Rewards" (2025). Honors Theses. 1002.
https://aquila.usm.edu/honors_theses/1002