Kobayashi, T. Intentionally-underestimated value function at terminal state for temporal-difference learning with mis-designed reward. Elsevier.
Chicago Style (17th ed.) CitationKobayashi, Taisuke. Intentionally-underestimated Value Function at Terminal State for Temporal-difference Learning with Mis-designed Reward. Elsevier.
MLA (9th ed.) CitationKobayashi, Taisuke. Intentionally-underestimated Value Function at Terminal State for Temporal-difference Learning with Mis-designed Reward. Elsevier.
Warning: These citations may not always be 100% accurate.