Bayesian Q learning method with Dyna architecture and prioritized sweeping
In order to balance this trade-off, a probability distribution was used in Bayesian Q learning method to de-scribe the uncertainty of the Q value and choose actions with this distribution. But the slow convergence is a big problem for Bayesian Q-Learning. In allusion to the above problems, a novel B...
Saved in:
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | zho |
Published: |
Editorial Department of Journal on Communications
2013-11-01
|
Series: | Tongxin xuebao |
Subjects: | |
Online Access: | http://www.joconline.com.cn/zh/article/doi/10.3969/j.issn.1000-436x.2013.11.015/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|