Multiple Unmanned Aerial Vehicle Collaborative Target Search by DRL: A DQN-Based Multi-Agent Partially Observable Method

As Unmanned Aerial Vehicle (UAV) technology advances, UAVs have attracted widespread attention across military and civilian fields due to their low cost and flexibility. In unknown environments, UAVs can significantly reduce the risk of casualties and improve the safety and covertness when performin...

Full description

Saved in:
Bibliographic Details
Main Authors: Heng Xu, Dayong Zhu
Format: Article
Language:English
Published: MDPI AG 2025-01-01
Series:Drones
Subjects:
Online Access:https://www.mdpi.com/2504-446X/9/1/74
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832588610888531968
author Heng Xu
Dayong Zhu
author_facet Heng Xu
Dayong Zhu
author_sort Heng Xu
collection DOAJ
description As Unmanned Aerial Vehicle (UAV) technology advances, UAVs have attracted widespread attention across military and civilian fields due to their low cost and flexibility. In unknown environments, UAVs can significantly reduce the risk of casualties and improve the safety and covertness when performing missions. Reinforcement Learning allows agents to learn optimal policies through trials in the environment, enabling UAVs to respond autonomously according to the real-time conditions. Due to the limitation of the observation range of UAV sensors, UAV target search missions face the challenge of partial observation. Based on this, Partially Observable Deep Q-Network (PODQN), which is a DQN-based algorithm is proposed. The PODQN algorithm utilizes the Gated Recurrent Unit (GRU) to remember the past observation information. It integrates the target network and decomposes the action value for better evaluation. In addition, the artificial potential field is introduced to solve the potential collision problem. The simulation environment for UAV target search is constructed through the custom Markov Decision Process. By comparing the PODQN algorithm with random strategy, DQN, Double DQN, Dueling DQN, VDN, QMIX, it is demonstrated that the proposed PODQN algorithm has the best performance under different agent configurations.
format Article
id doaj-art-2648dbead936410a80c2f034e2dfddcb
institution Kabale University
issn 2504-446X
language English
publishDate 2025-01-01
publisher MDPI AG
record_format Article
series Drones
spelling doaj-art-2648dbead936410a80c2f034e2dfddcb2025-01-24T13:29:53ZengMDPI AGDrones2504-446X2025-01-01917410.3390/drones9010074Multiple Unmanned Aerial Vehicle Collaborative Target Search by DRL: A DQN-Based Multi-Agent Partially Observable MethodHeng Xu0Dayong Zhu1School of Information and Software Engineering, University of Electronic Science and Technology of China, No. 4, Section 2, North Jianshe Road, Chengdu 610054, ChinaSchool of Information and Software Engineering, University of Electronic Science and Technology of China, No. 4, Section 2, North Jianshe Road, Chengdu 610054, ChinaAs Unmanned Aerial Vehicle (UAV) technology advances, UAVs have attracted widespread attention across military and civilian fields due to their low cost and flexibility. In unknown environments, UAVs can significantly reduce the risk of casualties and improve the safety and covertness when performing missions. Reinforcement Learning allows agents to learn optimal policies through trials in the environment, enabling UAVs to respond autonomously according to the real-time conditions. Due to the limitation of the observation range of UAV sensors, UAV target search missions face the challenge of partial observation. Based on this, Partially Observable Deep Q-Network (PODQN), which is a DQN-based algorithm is proposed. The PODQN algorithm utilizes the Gated Recurrent Unit (GRU) to remember the past observation information. It integrates the target network and decomposes the action value for better evaluation. In addition, the artificial potential field is introduced to solve the potential collision problem. The simulation environment for UAV target search is constructed through the custom Markov Decision Process. By comparing the PODQN algorithm with random strategy, DQN, Double DQN, Dueling DQN, VDN, QMIX, it is demonstrated that the proposed PODQN algorithm has the best performance under different agent configurations.https://www.mdpi.com/2504-446X/9/1/74deep q-networkpartially observableunmanned aerial vehiclemulti-agenttarget search
spellingShingle Heng Xu
Dayong Zhu
Multiple Unmanned Aerial Vehicle Collaborative Target Search by DRL: A DQN-Based Multi-Agent Partially Observable Method
Drones
deep q-network
partially observable
unmanned aerial vehicle
multi-agent
target search
title Multiple Unmanned Aerial Vehicle Collaborative Target Search by DRL: A DQN-Based Multi-Agent Partially Observable Method
title_full Multiple Unmanned Aerial Vehicle Collaborative Target Search by DRL: A DQN-Based Multi-Agent Partially Observable Method
title_fullStr Multiple Unmanned Aerial Vehicle Collaborative Target Search by DRL: A DQN-Based Multi-Agent Partially Observable Method
title_full_unstemmed Multiple Unmanned Aerial Vehicle Collaborative Target Search by DRL: A DQN-Based Multi-Agent Partially Observable Method
title_short Multiple Unmanned Aerial Vehicle Collaborative Target Search by DRL: A DQN-Based Multi-Agent Partially Observable Method
title_sort multiple unmanned aerial vehicle collaborative target search by drl a dqn based multi agent partially observable method
topic deep q-network
partially observable
unmanned aerial vehicle
multi-agent
target search
url https://www.mdpi.com/2504-446X/9/1/74
work_keys_str_mv AT hengxu multipleunmannedaerialvehiclecollaborativetargetsearchbydrladqnbasedmultiagentpartiallyobservablemethod
AT dayongzhu multipleunmannedaerialvehiclecollaborativetargetsearchbydrladqnbasedmultiagentpartiallyobservablemethod