A proximal policy optimization based deep reinforcement learning framework for tracking control of a flexible robotic manipulator

This paper puts forward a policy feedback based deep reinforcement learning (DRL) control scheme for a partially observable system by leveraging the potentials of proximal policy optimization (PPO) algorithm and convolutional neural network (CNN). Although several DRL algorithms have been investigat...

Full description

Saved in:

Bibliographic Details
Main Authors:	Joshi Kumar V, Vinodh Kumar Elumalai
Format:	Article
Language:	English
Published:	Elsevier 2025-03-01
Series:	Results in Engineering
Subjects:	Deep reinforcement learning Proximal policy gradient Policy feedback Flexible joint manipulator Vibration suppression
Online Access:	http://www.sciencedirect.com/science/article/pii/S2590123025002646
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832087364090986496
author	Joshi Kumar V Vinodh Kumar Elumalai
author_facet	Joshi Kumar V Vinodh Kumar Elumalai
author_sort	Joshi Kumar V
collection	DOAJ
description	This paper puts forward a policy feedback based deep reinforcement learning (DRL) control scheme for a partially observable system by leveraging the potentials of proximal policy optimization (PPO) algorithm and convolutional neural network (CNN). Although several DRL algorithms have been investigated for a fully observable system, there has been limited studies on devising a DRL control for a partially observable system with uncertain dynamics. Moreover, the major limitation of the existing policy gradient based DRL techniques is that they are computationally expensive and suffer from scalability issues for complex higher order systems. Hence, in this study, we adopt the PPO technique which utilizes first-order optimization to minimize the computational complexity and devise a DRL scheme for a partially observable flexible link robot manipulator system. Specifically, to improve the stability and convergence in PPO algorithm, this study adopts a collaborative policy approach in the update of value function and presents a collaborative proximal policy optimization (CPPO) algorithm that can address the tracking control and vibration suppression problems in partially observable robotic manipulator system. Identifying the optimal hyper-parameters of DRL using the grid search method, we exploit the capability of CNN in actor-critic architecture to extract the spatial dependencies in the state sequences of the dynamical system and boost the DRL performance. To improve the convergence of the proposed DRL algorithm, this study adopts the Lyapunov based reward shaping technique. The experimental validation on robotic manipulator system through hardware in loop (HIL) testing substantiates that the proposed framework offers faster convergence and better vibration suppression feature compared to the state-of-the-art policy gradient technique and actor-critic technique.
format	Article
id	doaj-art-579e66b48025427db8e32c85d6e177d1
institution	Kabale University
issn	2590-1230
language	English
publishDate	2025-03-01
publisher	Elsevier
record_format	Article
series	Results in Engineering
spelling	doaj-art-579e66b48025427db8e32c85d6e177d12025-02-06T05:12:42ZengElsevierResults in Engineering2590-12302025-03-0125104178A proximal policy optimization based deep reinforcement learning framework for tracking control of a flexible robotic manipulatorJoshi Kumar V0Vinodh Kumar Elumalai1School of Electrical Engineering, Vellore Institute of Technology, Vellore, Tamilnadu, 632014, IndiaCorresponding author.; School of Electrical Engineering, Vellore Institute of Technology, Vellore, Tamilnadu, 632014, IndiaThis paper puts forward a policy feedback based deep reinforcement learning (DRL) control scheme for a partially observable system by leveraging the potentials of proximal policy optimization (PPO) algorithm and convolutional neural network (CNN). Although several DRL algorithms have been investigated for a fully observable system, there has been limited studies on devising a DRL control for a partially observable system with uncertain dynamics. Moreover, the major limitation of the existing policy gradient based DRL techniques is that they are computationally expensive and suffer from scalability issues for complex higher order systems. Hence, in this study, we adopt the PPO technique which utilizes first-order optimization to minimize the computational complexity and devise a DRL scheme for a partially observable flexible link robot manipulator system. Specifically, to improve the stability and convergence in PPO algorithm, this study adopts a collaborative policy approach in the update of value function and presents a collaborative proximal policy optimization (CPPO) algorithm that can address the tracking control and vibration suppression problems in partially observable robotic manipulator system. Identifying the optimal hyper-parameters of DRL using the grid search method, we exploit the capability of CNN in actor-critic architecture to extract the spatial dependencies in the state sequences of the dynamical system and boost the DRL performance. To improve the convergence of the proposed DRL algorithm, this study adopts the Lyapunov based reward shaping technique. The experimental validation on robotic manipulator system through hardware in loop (HIL) testing substantiates that the proposed framework offers faster convergence and better vibration suppression feature compared to the state-of-the-art policy gradient technique and actor-critic technique.http://www.sciencedirect.com/science/article/pii/S2590123025002646Deep reinforcement learningProximal policy gradientPolicy feedbackFlexible joint manipulatorVibration suppression
spellingShingle	Joshi Kumar V Vinodh Kumar Elumalai A proximal policy optimization based deep reinforcement learning framework for tracking control of a flexible robotic manipulator Results in Engineering Deep reinforcement learning Proximal policy gradient Policy feedback Flexible joint manipulator Vibration suppression
title	A proximal policy optimization based deep reinforcement learning framework for tracking control of a flexible robotic manipulator
title_full	A proximal policy optimization based deep reinforcement learning framework for tracking control of a flexible robotic manipulator
title_fullStr	A proximal policy optimization based deep reinforcement learning framework for tracking control of a flexible robotic manipulator
title_full_unstemmed	A proximal policy optimization based deep reinforcement learning framework for tracking control of a flexible robotic manipulator
title_short	A proximal policy optimization based deep reinforcement learning framework for tracking control of a flexible robotic manipulator
title_sort	proximal policy optimization based deep reinforcement learning framework for tracking control of a flexible robotic manipulator
topic	Deep reinforcement learning Proximal policy gradient Policy feedback Flexible joint manipulator Vibration suppression
url	http://www.sciencedirect.com/science/article/pii/S2590123025002646
work_keys_str_mv	AT joshikumarv aproximalpolicyoptimizationbaseddeepreinforcementlearningframeworkfortrackingcontrolofaflexibleroboticmanipulator AT vinodhkumarelumalai aproximalpolicyoptimizationbaseddeepreinforcementlearningframeworkfortrackingcontrolofaflexibleroboticmanipulator AT joshikumarv proximalpolicyoptimizationbaseddeepreinforcementlearningframeworkfortrackingcontrolofaflexibleroboticmanipulator AT vinodhkumarelumalai proximalpolicyoptimizationbaseddeepreinforcementlearningframeworkfortrackingcontrolofaflexibleroboticmanipulator

A proximal policy optimization based deep reinforcement learning framework for tracking control of a flexible robotic manipulator

Similar Items