Biasing Exploration towards Positive Error for Efficient Reinforcement Learning

Efficient exploration remains a critical challenge in Reinforcement Learning (RL), significantly affecting sample efficiency. This paper demonstrates that biasing exploration towards state-action pairs with positive temporal difference error speeds up convergence and, in some challenging environmen...

Full description

Saved in:
Bibliographic Details
Main Authors: Adam Parker, John Sheppard
Format: Article
Language:English
Published: LibraryPress@UF 2025-05-01
Series:Proceedings of the International Florida Artificial Intelligence Research Society Conference
Subjects:
Online Access:https://journals.flvc.org/FLAIRS/article/view/138835
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Efficient exploration remains a critical challenge in Reinforcement Learning (RL), significantly affecting sample efficiency. This paper demonstrates that biasing exploration towards state-action pairs with positive temporal difference error speeds up convergence and, in some challenging environments, has the potential to result in an improved policy. We show that this Positive Error Bias (PEB) method achieves statistically significant performance improvements across various tasks and estimators. Empirical results demonstrate PEB’s effectiveness in bandits, grid worlds, and classic control tasks with exact and approximate estimators. PEB is particularly effective when unbiased exploration struggles with policy discovery.
ISSN:2334-0754
2334-0762