Optimized TD3 algorithm for robust autonomous navigation in crowded and dynamic human-interaction environments
Mobile robots have been incorporated into human society to help perform tasks that can affect or endanger health and life. One of the challenges is in the mobile robot-human interaction, as unexpected movements made by people can cause collisions with the robots when accomplishing a task. In this pa...
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Elsevier
2024-12-01
|
| Series: | Results in Engineering |
| Subjects: | |
| Online Access: | http://www.sciencedirect.com/science/article/pii/S2590123024011290 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Mobile robots have been incorporated into human society to help perform tasks that can affect or endanger health and life. One of the challenges is in the mobile robot-human interaction, as unexpected movements made by people can cause collisions with the robots when accomplishing a task. In this paper an optimized algorithm based on TD3 (Twin Deep Deterministic Policy gradient) used in Deep Reinforcement Learning is proposed, which allows the robot to take its own actions based on the observations it makes, without defining any trajectory beforehand. Using this algorithm that uses the actor-critical policy that helps to determine the linear and angular velocity of the robot, which allows the robot to move in unknown dynamic environments avoiding collisions. It is proposed to use a buffer that stores the values sent by the neural network and analyses them together with the odometry parameters of the robot to send the best decision to the robot to achieve a collision-free path and meet the objectives. The purpose of this algorithm is to meet as many consecutive targets as possible, i.e. the robot never returns to its initial position, it is assigned new targets regardless of the position it has reached. Finally, the results obtained by training the algorithm correspond nearly 0 for the actor and critic training, with a training of 12,000 episodes, and with an evaluation results 92 % of effectivity of our algorithm, based on 772 steps performed by the rob ten in a time of 11 s. |
|---|---|
| ISSN: | 2590-1230 |