Intentionally-underestimated value function at terminal state for temporal-difference learning with mis-designed reward

Robot control using reinforcement learning has become popular, but its learning process often terminates midway through an episode for safety and time-saving reasons. This study addresses the problem of the most popular exception handling that temporal-difference (TD) learning performs at such termi...

Full description

Saved in:
Bibliographic Details
Main Author: Taisuke Kobayashi
Format: Article
Language:English
Published: Elsevier 2025-03-01
Series:Results in Control and Optimization
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2666720725000165
Tags: Add Tag
No Tags, Be the first to tag this record!