Moor: Model-based offline policy optimization with a risk dynamics model

Abstract Offline reinforcement learning (RL) has been widely used in safety-critical domains by avoiding dangerous and costly online interaction. A significant challenge is addressing uncertainties and risks outside of offline data. Risk-sensitive offline RL attempts to solve this issue by risk aver...

Full description

Saved in:
Bibliographic Details
Main Authors: Xiaolong Su, Peng Li, Shaofei Chen
Format: Article
Language:English
Published: Springer 2024-11-01
Series:Complex & Intelligent Systems
Subjects:
Online Access:https://doi.org/10.1007/s40747-024-01621-x
Tags: Add Tag
No Tags, Be the first to tag this record!