Stability Analysis of Batch Offline Action-Dependent Heuristic Dynamic Programming Using Deep Neural Networks
In this paper, the theoretical stability of batch offline action-dependent heuristic dynamic programming (BOADHDP) is analyzed for deep neural network (NN) approximators for both the action value function and controller which are iteratively improved using collected experiences from the environment....
Saved in:
Main Author: | |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2025-01-01
|
Series: | Mathematics |
Subjects: | |
Online Access: | https://www.mdpi.com/2227-7390/13/2/206 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832588064588824576 |
---|---|
author | Timotei Lala |
author_facet | Timotei Lala |
author_sort | Timotei Lala |
collection | DOAJ |
description | In this paper, the theoretical stability of batch offline action-dependent heuristic dynamic programming (BOADHDP) is analyzed for deep neural network (NN) approximators for both the action value function and controller which are iteratively improved using collected experiences from the environment. Our findings extend previous research on the stability of online adaptive ADHDP learning with single-hidden-layer NNs by addressing the case of deep neural networks with an arbitrary number of hidden layers, updated offline using batched gradient descend updates. Specifically, our work shows that the learning process of the action value function and controller under BOADHDP is uniformly ultimately bounded (UUB), contingent on certain conditions related to NN learning rates. The developed theory demonstrates an inverse relationship between the number of hidden layers and the learning rate magnitude. We present a practical implementation involving a twin rotor aerodynamical system to emphasize the impact difference between the usage of single-hidden-layer and multiple-hidden-layer NN architectures in BOADHDP learning settings. The validation case study shows that BOADHDP with multiple hidden layer NN architecture implementation obtains <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>0.0034</mn></mrow></semantics></math></inline-formula> on the control benchmark, while the single-hidden-layer NN architectures obtain <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>0.0049</mn></mrow></semantics></math></inline-formula>, outperforming the former by 1.58% by using the same collected dataset and learning conditions. Also, BOADHDP is compared with online adaptive ADHDP, proving the superiority of the former over the latter, both in terms of controller performance and data efficiency. |
format | Article |
id | doaj-art-6b62da5782ac40eeb9543628fe5503da |
institution | Kabale University |
issn | 2227-7390 |
language | English |
publishDate | 2025-01-01 |
publisher | MDPI AG |
record_format | Article |
series | Mathematics |
spelling | doaj-art-6b62da5782ac40eeb9543628fe5503da2025-01-24T13:39:44ZengMDPI AGMathematics2227-73902025-01-0113220610.3390/math13020206Stability Analysis of Batch Offline Action-Dependent Heuristic Dynamic Programming Using Deep Neural NetworksTimotei Lala0Department of Automation and Applied Informatics, Politehnica University of Timisoara, 2, Bd. V. Parvan, 300223 Timisoara, RomaniaIn this paper, the theoretical stability of batch offline action-dependent heuristic dynamic programming (BOADHDP) is analyzed for deep neural network (NN) approximators for both the action value function and controller which are iteratively improved using collected experiences from the environment. Our findings extend previous research on the stability of online adaptive ADHDP learning with single-hidden-layer NNs by addressing the case of deep neural networks with an arbitrary number of hidden layers, updated offline using batched gradient descend updates. Specifically, our work shows that the learning process of the action value function and controller under BOADHDP is uniformly ultimately bounded (UUB), contingent on certain conditions related to NN learning rates. The developed theory demonstrates an inverse relationship between the number of hidden layers and the learning rate magnitude. We present a practical implementation involving a twin rotor aerodynamical system to emphasize the impact difference between the usage of single-hidden-layer and multiple-hidden-layer NN architectures in BOADHDP learning settings. The validation case study shows that BOADHDP with multiple hidden layer NN architecture implementation obtains <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>0.0034</mn></mrow></semantics></math></inline-formula> on the control benchmark, while the single-hidden-layer NN architectures obtain <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>0.0049</mn></mrow></semantics></math></inline-formula>, outperforming the former by 1.58% by using the same collected dataset and learning conditions. Also, BOADHDP is compared with online adaptive ADHDP, proving the superiority of the former over the latter, both in terms of controller performance and data efficiency.https://www.mdpi.com/2227-7390/13/2/206ADPADHDPdeep neural networksbatch learningLyapunov stabilityuniformly ultimately bounded |
spellingShingle | Timotei Lala Stability Analysis of Batch Offline Action-Dependent Heuristic Dynamic Programming Using Deep Neural Networks Mathematics ADP ADHDP deep neural networks batch learning Lyapunov stability uniformly ultimately bounded |
title | Stability Analysis of Batch Offline Action-Dependent Heuristic Dynamic Programming Using Deep Neural Networks |
title_full | Stability Analysis of Batch Offline Action-Dependent Heuristic Dynamic Programming Using Deep Neural Networks |
title_fullStr | Stability Analysis of Batch Offline Action-Dependent Heuristic Dynamic Programming Using Deep Neural Networks |
title_full_unstemmed | Stability Analysis of Batch Offline Action-Dependent Heuristic Dynamic Programming Using Deep Neural Networks |
title_short | Stability Analysis of Batch Offline Action-Dependent Heuristic Dynamic Programming Using Deep Neural Networks |
title_sort | stability analysis of batch offline action dependent heuristic dynamic programming using deep neural networks |
topic | ADP ADHDP deep neural networks batch learning Lyapunov stability uniformly ultimately bounded |
url | https://www.mdpi.com/2227-7390/13/2/206 |
work_keys_str_mv | AT timoteilala stabilityanalysisofbatchofflineactiondependentheuristicdynamicprogrammingusingdeepneuralnetworks |