Directly Attention loss adjusted prioritized experience replay
Abstract Prioritized Experience Replay enables the model to learn more about relatively important samples by artificially changing their accessed frequencies. However, this non-uniform sampling method shifts the state-action distribution that is originally used to estimate Q-value functions, which b...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Springer
2025-04-01
|
| Series: | Complex & Intelligent Systems |
| Subjects: | |
| Online Access: | https://doi.org/10.1007/s40747-025-01852-6 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Abstract Prioritized Experience Replay enables the model to learn more about relatively important samples by artificially changing their accessed frequencies. However, this non-uniform sampling method shifts the state-action distribution that is originally used to estimate Q-value functions, which brings about the estimation deviation. In this article, a novel off-policy reinforcement learning training framework called Directly Attention Loss Adjusted Prioritized Experience Replay (DALAP) is proposed, which can directly quantify the changed extent of the shifted distribution through Parallel Self-Attention network, enabling precise error compensation. Furthermore, a Priority-Encouragement mechanism is designed to optimize the sample screening criteria, and enhance training efficiency. To verify the effectiveness of DALAP, a realistic environment of multi-USV, based on Unreal Engine, is constructed. Comparative experiments across multiple groups demonstrate that DALAP offers significant advantages, including faster convergence and smaller training variance. |
|---|---|
| ISSN: | 2199-4536 2198-6053 |