Stability Analysis of Batch Offline Action-Dependent Heuristic Dynamic Programming Using Deep Neural Networks

In this paper, the theoretical stability of batch offline action-dependent heuristic dynamic programming (BOADHDP) is analyzed for deep neural network (NN) approximators for both the action value function and controller which are iteratively improved using collected experiences from the environment....

Full description

Saved in:
Bibliographic Details
Main Author: Timotei Lala
Format: Article
Language:English
Published: MDPI AG 2025-01-01
Series:Mathematics
Subjects:
Online Access:https://www.mdpi.com/2227-7390/13/2/206
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832588064588824576
author Timotei Lala
author_facet Timotei Lala
author_sort Timotei Lala
collection DOAJ
description In this paper, the theoretical stability of batch offline action-dependent heuristic dynamic programming (BOADHDP) is analyzed for deep neural network (NN) approximators for both the action value function and controller which are iteratively improved using collected experiences from the environment. Our findings extend previous research on the stability of online adaptive ADHDP learning with single-hidden-layer NNs by addressing the case of deep neural networks with an arbitrary number of hidden layers, updated offline using batched gradient descend updates. Specifically, our work shows that the learning process of the action value function and controller under BOADHDP is uniformly ultimately bounded (UUB), contingent on certain conditions related to NN learning rates. The developed theory demonstrates an inverse relationship between the number of hidden layers and the learning rate magnitude. We present a practical implementation involving a twin rotor aerodynamical system to emphasize the impact difference between the usage of single-hidden-layer and multiple-hidden-layer NN architectures in BOADHDP learning settings. The validation case study shows that BOADHDP with multiple hidden layer NN architecture implementation obtains <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>0.0034</mn></mrow></semantics></math></inline-formula> on the control benchmark, while the single-hidden-layer NN architectures obtain <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>0.0049</mn></mrow></semantics></math></inline-formula>, outperforming the former by 1.58% by using the same collected dataset and learning conditions. Also, BOADHDP is compared with online adaptive ADHDP, proving the superiority of the former over the latter, both in terms of controller performance and data efficiency.
format Article
id doaj-art-6b62da5782ac40eeb9543628fe5503da
institution Kabale University
issn 2227-7390
language English
publishDate 2025-01-01
publisher MDPI AG
record_format Article
series Mathematics
spelling doaj-art-6b62da5782ac40eeb9543628fe5503da2025-01-24T13:39:44ZengMDPI AGMathematics2227-73902025-01-0113220610.3390/math13020206Stability Analysis of Batch Offline Action-Dependent Heuristic Dynamic Programming Using Deep Neural NetworksTimotei Lala0Department of Automation and Applied Informatics, Politehnica University of Timisoara, 2, Bd. V. Parvan, 300223 Timisoara, RomaniaIn this paper, the theoretical stability of batch offline action-dependent heuristic dynamic programming (BOADHDP) is analyzed for deep neural network (NN) approximators for both the action value function and controller which are iteratively improved using collected experiences from the environment. Our findings extend previous research on the stability of online adaptive ADHDP learning with single-hidden-layer NNs by addressing the case of deep neural networks with an arbitrary number of hidden layers, updated offline using batched gradient descend updates. Specifically, our work shows that the learning process of the action value function and controller under BOADHDP is uniformly ultimately bounded (UUB), contingent on certain conditions related to NN learning rates. The developed theory demonstrates an inverse relationship between the number of hidden layers and the learning rate magnitude. We present a practical implementation involving a twin rotor aerodynamical system to emphasize the impact difference between the usage of single-hidden-layer and multiple-hidden-layer NN architectures in BOADHDP learning settings. The validation case study shows that BOADHDP with multiple hidden layer NN architecture implementation obtains <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>0.0034</mn></mrow></semantics></math></inline-formula> on the control benchmark, while the single-hidden-layer NN architectures obtain <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>0.0049</mn></mrow></semantics></math></inline-formula>, outperforming the former by 1.58% by using the same collected dataset and learning conditions. Also, BOADHDP is compared with online adaptive ADHDP, proving the superiority of the former over the latter, both in terms of controller performance and data efficiency.https://www.mdpi.com/2227-7390/13/2/206ADPADHDPdeep neural networksbatch learningLyapunov stabilityuniformly ultimately bounded
spellingShingle Timotei Lala
Stability Analysis of Batch Offline Action-Dependent Heuristic Dynamic Programming Using Deep Neural Networks
Mathematics
ADP
ADHDP
deep neural networks
batch learning
Lyapunov stability
uniformly ultimately bounded
title Stability Analysis of Batch Offline Action-Dependent Heuristic Dynamic Programming Using Deep Neural Networks
title_full Stability Analysis of Batch Offline Action-Dependent Heuristic Dynamic Programming Using Deep Neural Networks
title_fullStr Stability Analysis of Batch Offline Action-Dependent Heuristic Dynamic Programming Using Deep Neural Networks
title_full_unstemmed Stability Analysis of Batch Offline Action-Dependent Heuristic Dynamic Programming Using Deep Neural Networks
title_short Stability Analysis of Batch Offline Action-Dependent Heuristic Dynamic Programming Using Deep Neural Networks
title_sort stability analysis of batch offline action dependent heuristic dynamic programming using deep neural networks
topic ADP
ADHDP
deep neural networks
batch learning
Lyapunov stability
uniformly ultimately bounded
url https://www.mdpi.com/2227-7390/13/2/206
work_keys_str_mv AT timoteilala stabilityanalysisofbatchofflineactiondependentheuristicdynamicprogrammingusingdeepneuralnetworks