Stability Analysis of Batch Offline Action-Dependent Heuristic Dynamic Programming Using Deep Neural Networks

In this paper, the theoretical stability of batch offline action-dependent heuristic dynamic programming (BOADHDP) is analyzed for deep neural network (NN) approximators for both the action value function and controller which are iteratively improved using collected experiences from the environment....

Full description

Saved in:

Bibliographic Details
Main Author:	Timotei Lala
Format:	Article
Language:	English
Published:	MDPI AG 2025-01-01
Series:	Mathematics
Subjects:	ADP ADHDP deep neural networks batch learning Lyapunov stability uniformly ultimately bounded
Online Access:	https://www.mdpi.com/2227-7390/13/2/206
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832588064588824576
author	Timotei Lala
author_facet	Timotei Lala
author_sort	Timotei Lala
collection	DOAJ
description	In this paper, the theoretical stability of batch offline action-dependent heuristic dynamic programming (BOADHDP) is analyzed for deep neural network (NN) approximators for both the action value function and controller which are iteratively improved using collected experiences from the environment. Our findings extend previous research on the stability of online adaptive ADHDP learning with single-hidden-layer NNs by addressing the case of deep neural networks with an arbitrary number of hidden layers, updated offline using batched gradient descend updates. Specifically, our work shows that the learning process of the action value function and controller under BOADHDP is uniformly ultimately bounded (UUB), contingent on certain conditions related to NN learning rates. The developed theory demonstrates an inverse relationship between the number of hidden layers and the learning rate magnitude. We present a practical implementation involving a twin rotor aerodynamical system to emphasize the impact difference between the usage of single-hidden-layer and multiple-hidden-layer NN architectures in BOADHDP learning settings. The validation case study shows that BOADHDP with multiple hidden layer NN architecture implementation obtains <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>0.0034</mn></mrow></semantics></math></inline-formula> on the control benchmark, while the single-hidden-layer NN architectures obtain <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>0.0049</mn></mrow></semantics></math></inline-formula>, outperforming the former by 1.58% by using the same collected dataset and learning conditions. Also, BOADHDP is compared with online adaptive ADHDP, proving the superiority of the former over the latter, both in terms of controller performance and data efficiency.
format	Article
id	doaj-art-6b62da5782ac40eeb9543628fe5503da
institution	Kabale University
issn	2227-7390
language	English
publishDate	2025-01-01
publisher	MDPI AG
record_format	Article
series	Mathematics
spelling	doaj-art-6b62da5782ac40eeb9543628fe5503da2025-01-24T13:39:44ZengMDPI AGMathematics2227-73902025-01-0113220610.3390/math13020206Stability Analysis of Batch Offline Action-Dependent Heuristic Dynamic Programming Using Deep Neural NetworksTimotei Lala0Department of Automation and Applied Informatics, Politehnica University of Timisoara, 2, Bd. V. Parvan, 300223 Timisoara, RomaniaIn this paper, the theoretical stability of batch offline action-dependent heuristic dynamic programming (BOADHDP) is analyzed for deep neural network (NN) approximators for both the action value function and controller which are iteratively improved using collected experiences from the environment. Our findings extend previous research on the stability of online adaptive ADHDP learning with single-hidden-layer NNs by addressing the case of deep neural networks with an arbitrary number of hidden layers, updated offline using batched gradient descend updates. Specifically, our work shows that the learning process of the action value function and controller under BOADHDP is uniformly ultimately bounded (UUB), contingent on certain conditions related to NN learning rates. The developed theory demonstrates an inverse relationship between the number of hidden layers and the learning rate magnitude. We present a practical implementation involving a twin rotor aerodynamical system to emphasize the impact difference between the usage of single-hidden-layer and multiple-hidden-layer NN architectures in BOADHDP learning settings. The validation case study shows that BOADHDP with multiple hidden layer NN architecture implementation obtains <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>0.0034</mn></mrow></semantics></math></inline-formula> on the control benchmark, while the single-hidden-layer NN architectures obtain <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>0.0049</mn></mrow></semantics></math></inline-formula>, outperforming the former by 1.58% by using the same collected dataset and learning conditions. Also, BOADHDP is compared with online adaptive ADHDP, proving the superiority of the former over the latter, both in terms of controller performance and data efficiency.https://www.mdpi.com/2227-7390/13/2/206ADPADHDPdeep neural networksbatch learningLyapunov stabilityuniformly ultimately bounded
spellingShingle	Timotei Lala Stability Analysis of Batch Offline Action-Dependent Heuristic Dynamic Programming Using Deep Neural Networks Mathematics ADP ADHDP deep neural networks batch learning Lyapunov stability uniformly ultimately bounded
title	Stability Analysis of Batch Offline Action-Dependent Heuristic Dynamic Programming Using Deep Neural Networks
title_full	Stability Analysis of Batch Offline Action-Dependent Heuristic Dynamic Programming Using Deep Neural Networks
title_fullStr	Stability Analysis of Batch Offline Action-Dependent Heuristic Dynamic Programming Using Deep Neural Networks
title_full_unstemmed	Stability Analysis of Batch Offline Action-Dependent Heuristic Dynamic Programming Using Deep Neural Networks
title_short	Stability Analysis of Batch Offline Action-Dependent Heuristic Dynamic Programming Using Deep Neural Networks
title_sort	stability analysis of batch offline action dependent heuristic dynamic programming using deep neural networks
topic	ADP ADHDP deep neural networks batch learning Lyapunov stability uniformly ultimately bounded
url	https://www.mdpi.com/2227-7390/13/2/206
work_keys_str_mv	AT timoteilala stabilityanalysisofbatchofflineactiondependentheuristicdynamicprogrammingusingdeepneuralnetworks

Stability Analysis of Batch Offline Action-Dependent Heuristic Dynamic Programming Using Deep Neural Networks

Similar Items