Biasing Exploration towards Positive Error for Efficient Reinforcement Learning
Efficient exploration remains a critical challenge in Reinforcement Learning (RL), significantly affecting sample efficiency. This paper demonstrates that biasing exploration towards state-action pairs with positive temporal difference error speeds up convergence and, in some challenging environmen...
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
LibraryPress@UF
2025-05-01
|
| Series: | Proceedings of the International Florida Artificial Intelligence Research Society Conference |
| Subjects: | |
| Online Access: | https://journals.flvc.org/FLAIRS/article/view/138835 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Efficient exploration remains a critical challenge in Reinforcement Learning (RL), significantly affecting sample efficiency. This paper demonstrates that biasing exploration towards state-action pairs with positive temporal difference error speeds up convergence and, in some challenging environments, has the potential to result in an improved policy. We show that this Positive Error Bias (PEB) method achieves statistically significant performance improvements across various tasks and estimators. Empirical results demonstrate PEB’s effectiveness in bandits, grid worlds, and classic control tasks with exact and approximate estimators. PEB is particularly effective when unbiased exploration struggles with policy discovery.
|
|---|---|
| ISSN: | 2334-0754 2334-0762 |