A study of value iteration and policy iteration for Markov decision processes in Deterministic systems

In the context of deterministic discrete-time control systems, we examined the implementation of value iteration (VI) and policy (PI) algorithms in Markov decision processes (MDPs) situated within Borel spaces. The deterministic nature of the system's transfer function plays a pivotal role, as...

Full description

Saved in:

Bibliographic Details
Main Authors:	Haifeng Zheng, Dan Wang
Format:	Article
Language:	English
Published:	AIMS Press 2024-11-01
Series:	AIMS Mathematics
Subjects:	markov decision processes deterministic system value iteration policy iteration average cost criterion
Online Access:	https://www.aimspress.com/article/doi/10.3934/math.20241613
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832590733381468160
author	Haifeng Zheng Dan Wang
author_facet	Haifeng Zheng Dan Wang
author_sort	Haifeng Zheng
collection	DOAJ
description	In the context of deterministic discrete-time control systems, we examined the implementation of value iteration (VI) and policy (PI) algorithms in Markov decision processes (MDPs) situated within Borel spaces. The deterministic nature of the system's transfer function plays a pivotal role, as the convergence criteria of these algorithms are deeply interconnected with the inherent characteristics of the probability function governing state transitions. For VI, convergence is contingent upon verifying that the cost difference function stabilizes to a constant $ k $ ensuring uniformity across iterations. In contrast, PI achieves convergence when the value function maintains consistent values over successive iterations. Finally, a detailed example demonstrates the conditions under which convergence of the algorithm is achieved, underscoring the practicality of these methods in deterministic settings.
format	Article
id	doaj-art-2fe90045f28545bbb532a2602d8bd06d
institution	Kabale University
issn	2473-6988
language	English
publishDate	2024-11-01
publisher	AIMS Press
record_format	Article
series	AIMS Mathematics
spelling	doaj-art-2fe90045f28545bbb532a2602d8bd06d2025-01-23T07:53:24ZengAIMS PressAIMS Mathematics2473-69882024-11-01912338183384210.3934/math.20241613A study of value iteration and policy iteration for Markov decision processes in Deterministic systemsHaifeng Zheng0Dan Wang1School of Economics, Jinan University, Guangzhou 510632, Guangdong, ChinaSchool of Economics, Jinan University, Guangzhou 510632, Guangdong, ChinaIn the context of deterministic discrete-time control systems, we examined the implementation of value iteration (VI) and policy (PI) algorithms in Markov decision processes (MDPs) situated within Borel spaces. The deterministic nature of the system's transfer function plays a pivotal role, as the convergence criteria of these algorithms are deeply interconnected with the inherent characteristics of the probability function governing state transitions. For VI, convergence is contingent upon verifying that the cost difference function stabilizes to a constant $ k $ ensuring uniformity across iterations. In contrast, PI achieves convergence when the value function maintains consistent values over successive iterations. Finally, a detailed example demonstrates the conditions under which convergence of the algorithm is achieved, underscoring the practicality of these methods in deterministic settings.https://www.aimspress.com/article/doi/10.3934/math.20241613markov decision processesdeterministic systemvalue iterationpolicy iterationaverage cost criterion
spellingShingle	Haifeng Zheng Dan Wang A study of value iteration and policy iteration for Markov decision processes in Deterministic systems AIMS Mathematics markov decision processes deterministic system value iteration policy iteration average cost criterion
title	A study of value iteration and policy iteration for Markov decision processes in Deterministic systems
title_full	A study of value iteration and policy iteration for Markov decision processes in Deterministic systems
title_fullStr	A study of value iteration and policy iteration for Markov decision processes in Deterministic systems
title_full_unstemmed	A study of value iteration and policy iteration for Markov decision processes in Deterministic systems
title_short	A study of value iteration and policy iteration for Markov decision processes in Deterministic systems
title_sort	study of value iteration and policy iteration for markov decision processes in deterministic systems
topic	markov decision processes deterministic system value iteration policy iteration average cost criterion
url	https://www.aimspress.com/article/doi/10.3934/math.20241613
work_keys_str_mv	AT haifengzheng astudyofvalueiterationandpolicyiterationformarkovdecisionprocessesindeterministicsystems AT danwang astudyofvalueiterationandpolicyiterationformarkovdecisionprocessesindeterministicsystems AT haifengzheng studyofvalueiterationandpolicyiterationformarkovdecisionprocessesindeterministicsystems AT danwang studyofvalueiterationandpolicyiterationformarkovdecisionprocessesindeterministicsystems

A study of value iteration and policy iteration for Markov decision processes in Deterministic systems

Similar Items