Swarm Confrontation Algorithm for UGV Swarm with Quantity Advantage by a Novel MSRM-MAPOCA Training Method

This paper considers the swarm confrontation problem for two teams of unmanned ground vehicles (UGVs). Different from most of the existing works where the two teams are identical, we consider the scenario of two heterogenous teams. In particular, one team has the quantity advantage while the other h...

Full description

Saved in:
Bibliographic Details
Main Authors: Huanli Gao, Chongming Zhao, Xinghe Yu, Shuangfei Ren, He Cai
Format: Article
Language:English
Published: MDPI AG 2025-01-01
Series:Actuators
Subjects:
Online Access:https://www.mdpi.com/2076-0825/14/1/15
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832589522258362368
author Huanli Gao
Chongming Zhao
Xinghe Yu
Shuangfei Ren
He Cai
author_facet Huanli Gao
Chongming Zhao
Xinghe Yu
Shuangfei Ren
He Cai
author_sort Huanli Gao
collection DOAJ
description This paper considers the swarm confrontation problem for two teams of unmanned ground vehicles (UGVs). Different from most of the existing works where the two teams are identical, we consider the scenario of two heterogenous teams. In particular, one team has the quantity advantage while the other has the resilience advantage. Nevertheless, it is verified by standard tests to show that the overall capabilities of these two heterogenous teams are almost the same. The objective of this article is to design a swarm confrontation algorithm for the team with quantity advantage based on the multi-agent reinforcement learning training method. To address the issue of sparse reward which would result in inefficient learning and poor training performance, a novel macro states reward mechanism based on multi-agent posthumous credit assignment (MSRM-MAPOCA) is proposed in this paper, which together with fine-tuned smooth reward design can fully exploit the advantage in quantity and thus leads to outstanding training performance. Based on the Unity 3D platform, comprehensive direct and indirect comparative tests have been conducted, where the results show that the swarm confrontation algorithm proposed in this article triumphs over other classic or up-to-date swarm confrontation algorithms in terms of both win rate and efficiency.
format Article
id doaj-art-15ee579b611743c7ba396fde325d1f80
institution Kabale University
issn 2076-0825
language English
publishDate 2025-01-01
publisher MDPI AG
record_format Article
series Actuators
spelling doaj-art-15ee579b611743c7ba396fde325d1f802025-01-24T13:15:10ZengMDPI AGActuators2076-08252025-01-011411510.3390/act14010015Swarm Confrontation Algorithm for UGV Swarm with Quantity Advantage by a Novel MSRM-MAPOCA Training MethodHuanli Gao0Chongming Zhao1Xinghe Yu2Shuangfei Ren3He Cai4School of Automation Science and Engineering, South China University of Technology, Guangzhou 510641, ChinaSchool of Automation Science and Engineering, South China University of Technology, Guangzhou 510641, ChinaSchool of Automation Science and Engineering, South China University of Technology, Guangzhou 510641, ChinaSchool of Automation Science and Engineering, South China University of Technology, Guangzhou 510641, ChinaSchool of Automation Science and Engineering, South China University of Technology, Guangzhou 510641, ChinaThis paper considers the swarm confrontation problem for two teams of unmanned ground vehicles (UGVs). Different from most of the existing works where the two teams are identical, we consider the scenario of two heterogenous teams. In particular, one team has the quantity advantage while the other has the resilience advantage. Nevertheless, it is verified by standard tests to show that the overall capabilities of these two heterogenous teams are almost the same. The objective of this article is to design a swarm confrontation algorithm for the team with quantity advantage based on the multi-agent reinforcement learning training method. To address the issue of sparse reward which would result in inefficient learning and poor training performance, a novel macro states reward mechanism based on multi-agent posthumous credit assignment (MSRM-MAPOCA) is proposed in this paper, which together with fine-tuned smooth reward design can fully exploit the advantage in quantity and thus leads to outstanding training performance. Based on the Unity 3D platform, comprehensive direct and indirect comparative tests have been conducted, where the results show that the swarm confrontation algorithm proposed in this article triumphs over other classic or up-to-date swarm confrontation algorithms in terms of both win rate and efficiency.https://www.mdpi.com/2076-0825/14/1/15swarm confrontationmulti-agent reinforcement learningmacro states reward mechanismsmooth rewardunmanned ground vehicle
spellingShingle Huanli Gao
Chongming Zhao
Xinghe Yu
Shuangfei Ren
He Cai
Swarm Confrontation Algorithm for UGV Swarm with Quantity Advantage by a Novel MSRM-MAPOCA Training Method
Actuators
swarm confrontation
multi-agent reinforcement learning
macro states reward mechanism
smooth reward
unmanned ground vehicle
title Swarm Confrontation Algorithm for UGV Swarm with Quantity Advantage by a Novel MSRM-MAPOCA Training Method
title_full Swarm Confrontation Algorithm for UGV Swarm with Quantity Advantage by a Novel MSRM-MAPOCA Training Method
title_fullStr Swarm Confrontation Algorithm for UGV Swarm with Quantity Advantage by a Novel MSRM-MAPOCA Training Method
title_full_unstemmed Swarm Confrontation Algorithm for UGV Swarm with Quantity Advantage by a Novel MSRM-MAPOCA Training Method
title_short Swarm Confrontation Algorithm for UGV Swarm with Quantity Advantage by a Novel MSRM-MAPOCA Training Method
title_sort swarm confrontation algorithm for ugv swarm with quantity advantage by a novel msrm mapoca training method
topic swarm confrontation
multi-agent reinforcement learning
macro states reward mechanism
smooth reward
unmanned ground vehicle
url https://www.mdpi.com/2076-0825/14/1/15
work_keys_str_mv AT huanligao swarmconfrontationalgorithmforugvswarmwithquantityadvantagebyanovelmsrmmapocatrainingmethod
AT chongmingzhao swarmconfrontationalgorithmforugvswarmwithquantityadvantagebyanovelmsrmmapocatrainingmethod
AT xingheyu swarmconfrontationalgorithmforugvswarmwithquantityadvantagebyanovelmsrmmapocatrainingmethod
AT shuangfeiren swarmconfrontationalgorithmforugvswarmwithquantityadvantagebyanovelmsrmmapocatrainingmethod
AT hecai swarmconfrontationalgorithmforugvswarmwithquantityadvantagebyanovelmsrmmapocatrainingmethod