Intentionally-underestimated value function at terminal state for temporal-difference learning with mis-designed reward

Robot control using reinforcement learning has become popular, but its learning process often terminates midway through an episode for safety and time-saving reasons. This study addresses the problem of the most popular exception handling that temporal-difference (TD) learning performs at such termi...

Full description

Saved in:

Bibliographic Details
Main Author:	Taisuke Kobayashi
Format:	Article
Language:	English
Published:	Elsevier 2025-03-01
Series:	Results in Control and Optimization
Subjects:	Temporal-difference learning Arbitrariness of reward design Exception handling at episode termination Intentional underestimation of terminal value
Online Access:	http://www.sciencedirect.com/science/article/pii/S2666720725000165
Tags:	Add Tag No Tags, Be the first to tag this record!

Internet

http://www.sciencedirect.com/science/article/pii/S2666720725000165

Intentionally-underestimated value function at terminal state for temporal-difference learning with mis-designed reward

Internet

Similar Items