A study of value iteration and policy iteration for Markov decision processes in Deterministic systems
In the context of deterministic discrete-time control systems, we examined the implementation of value iteration (VI) and policy (PI) algorithms in Markov decision processes (MDPs) situated within Borel spaces. The deterministic nature of the system's transfer function plays a pivotal role, as...
Saved in:
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
AIMS Press
2024-11-01
|
Series: | AIMS Mathematics |
Subjects: | |
Online Access: | https://www.aimspress.com/article/doi/10.3934/math.20241613 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832590733381468160 |
---|---|
author | Haifeng Zheng Dan Wang |
author_facet | Haifeng Zheng Dan Wang |
author_sort | Haifeng Zheng |
collection | DOAJ |
description | In the context of deterministic discrete-time control systems, we examined the implementation of value iteration (VI) and policy (PI) algorithms in Markov decision processes (MDPs) situated within Borel spaces. The deterministic nature of the system's transfer function plays a pivotal role, as the convergence criteria of these algorithms are deeply interconnected with the inherent characteristics of the probability function governing state transitions. For VI, convergence is contingent upon verifying that the cost difference function stabilizes to a constant $ k $ ensuring uniformity across iterations. In contrast, PI achieves convergence when the value function maintains consistent values over successive iterations. Finally, a detailed example demonstrates the conditions under which convergence of the algorithm is achieved, underscoring the practicality of these methods in deterministic settings. |
format | Article |
id | doaj-art-2fe90045f28545bbb532a2602d8bd06d |
institution | Kabale University |
issn | 2473-6988 |
language | English |
publishDate | 2024-11-01 |
publisher | AIMS Press |
record_format | Article |
series | AIMS Mathematics |
spelling | doaj-art-2fe90045f28545bbb532a2602d8bd06d2025-01-23T07:53:24ZengAIMS PressAIMS Mathematics2473-69882024-11-01912338183384210.3934/math.20241613A study of value iteration and policy iteration for Markov decision processes in Deterministic systemsHaifeng Zheng0Dan Wang1School of Economics, Jinan University, Guangzhou 510632, Guangdong, ChinaSchool of Economics, Jinan University, Guangzhou 510632, Guangdong, ChinaIn the context of deterministic discrete-time control systems, we examined the implementation of value iteration (VI) and policy (PI) algorithms in Markov decision processes (MDPs) situated within Borel spaces. The deterministic nature of the system's transfer function plays a pivotal role, as the convergence criteria of these algorithms are deeply interconnected with the inherent characteristics of the probability function governing state transitions. For VI, convergence is contingent upon verifying that the cost difference function stabilizes to a constant $ k $ ensuring uniformity across iterations. In contrast, PI achieves convergence when the value function maintains consistent values over successive iterations. Finally, a detailed example demonstrates the conditions under which convergence of the algorithm is achieved, underscoring the practicality of these methods in deterministic settings.https://www.aimspress.com/article/doi/10.3934/math.20241613markov decision processesdeterministic systemvalue iterationpolicy iterationaverage cost criterion |
spellingShingle | Haifeng Zheng Dan Wang A study of value iteration and policy iteration for Markov decision processes in Deterministic systems AIMS Mathematics markov decision processes deterministic system value iteration policy iteration average cost criterion |
title | A study of value iteration and policy iteration for Markov decision processes in Deterministic systems |
title_full | A study of value iteration and policy iteration for Markov decision processes in Deterministic systems |
title_fullStr | A study of value iteration and policy iteration for Markov decision processes in Deterministic systems |
title_full_unstemmed | A study of value iteration and policy iteration for Markov decision processes in Deterministic systems |
title_short | A study of value iteration and policy iteration for Markov decision processes in Deterministic systems |
title_sort | study of value iteration and policy iteration for markov decision processes in deterministic systems |
topic | markov decision processes deterministic system value iteration policy iteration average cost criterion |
url | https://www.aimspress.com/article/doi/10.3934/math.20241613 |
work_keys_str_mv | AT haifengzheng astudyofvalueiterationandpolicyiterationformarkovdecisionprocessesindeterministicsystems AT danwang astudyofvalueiterationandpolicyiterationformarkovdecisionprocessesindeterministicsystems AT haifengzheng studyofvalueiterationandpolicyiterationformarkovdecisionprocessesindeterministicsystems AT danwang studyofvalueiterationandpolicyiterationformarkovdecisionprocessesindeterministicsystems |