Bayesian Q learning method with Dyna architecture and prioritized sweeping

In order to balance this trade-off, a probability distribution was used in Bayesian Q learning method to de-scribe the uncertainty of the Q value and choose actions with this distribution. But the slow convergence is a big problem for Bayesian Q-Learning. In allusion to the above problems, a novel B...

Full description

Saved in:
Bibliographic Details
Main Authors: Jun YU, Quan LIU, Qi-ming FU, Hong-kun SUN, Gui-xing CHEN
Format: Article
Language:zho
Published: Editorial Department of Journal on Communications 2013-11-01
Series:Tongxin xuebao
Subjects:
Online Access:http://www.joconline.com.cn/zh/article/doi/10.3969/j.issn.1000-436x.2013.11.015/
Tags: Add Tag
No Tags, Be the first to tag this record!