Multi-Objective Dynamic Path Planning with Multi-Agent Deep Reinforcement Learning
Multi-agent reinforcement learning (MARL) is characterized by its simple structure and strong adaptability, which has led to its widespread application in the field of path planning. To address the challenge of optimal path planning for mobile agent clusters in uncertain environments, a multi-object...
Saved in:
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2024-12-01
|
Series: | Journal of Marine Science and Engineering |
Subjects: | |
Online Access: | https://www.mdpi.com/2077-1312/13/1/20 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Multi-agent reinforcement learning (MARL) is characterized by its simple structure and strong adaptability, which has led to its widespread application in the field of path planning. To address the challenge of optimal path planning for mobile agent clusters in uncertain environments, a multi-objective dynamic path planning model (MODPP) based on multi-agent deep reinforcement learning (MADRL) has been proposed. This model is suitable for complex, unstable task environments characterized by dimensionality explosion and offers scalability. The approach consists of two components: an action evaluation module and an action decision module, utilizing a centralized training with decentralized execution (CTDE) training architecture. During the training process, agents within the cluster learn cooperative strategies while being able to communicate with one another. Consequently, they can navigate through task environments without communication, achieving collision-free paths that optimize multiple sub-objectives globally, minimizing time, distance, and overall costs associated with turning. Furthermore, in real-task execution, agents acting as mobile entities can perform real-time obstacle avoidance. Finally, based on the OpenAI Gym platform, environments such as simple multi-objective environment and complex multi-objective environment were designed to analyze the rationality and effectiveness of the multi-objective dynamic path planning through minimum cost and collision risk assessments. Additionally, the impact of reward function configuration on agent strategies was discussed. |
---|---|
ISSN: | 2077-1312 |