Moor: Model-based offline policy optimization with a risk dynamics model

Abstract Offline reinforcement learning (RL) has been widely used in safety-critical domains by avoiding dangerous and costly online interaction. A significant challenge is addressing uncertainties and risks outside of offline data. Risk-sensitive offline RL attempts to solve this issue by risk aver...

Full description

Saved in:

Bibliographic Details
Main Authors:	Xiaolong Su, Peng Li, Shaofei Chen
Format:	Article
Language:	English
Published:	Springer 2024-11-01
Series:	Complex & Intelligent Systems
Subjects:	Offline Reinforcement Learning Risk-sensitive Reinforcement Learning Risk Dynamics Model Reward Relabelling
Online Access:	https://doi.org/10.1007/s40747-024-01621-x
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Abstract Offline reinforcement learning (RL) has been widely used in safety-critical domains by avoiding dangerous and costly online interaction. A significant challenge is addressing uncertainties and risks outside of offline data. Risk-sensitive offline RL attempts to solve this issue by risk aversion. However, current model-based approaches only extract state transition information and reward information using dynamics models, which cannot capture risk information implicit in offline data and may result in the misuse of high-risk data. In this work, we propose a model-based offline policy optimization approach with a risk dynamics model (MOOR). Specifically, we construct a risk dynamics model using a quantile network that can learn the risk information of data, then we reshape model-generated data based on errors of the risk dynamics model and the risk information of data. Finally, we use a risk-averse algorithm to learn the policy on the combined dataset of offline and generated data. We theoretically prove that MOOR can identify risk information of data and avoid utilizing high-risk data, our experiments show that MOOR outperforms existing approaches and achieves state-of-the-art results in risk-sensitive D4RL and risky navigation tasks.
ISSN:	2199-4536 2198-6053

Moor: Model-based offline policy optimization with a risk dynamics model

Similar Items