Advancing ship automatic navigation strategy with prior knowledge and hierarchical penalty in irregular obstacles: a reinforcement learning approach to enhanced efficiency and safety

With the global wave of intelligence and automation, ship autopilot technology has become the key to improving the efficiency of marine transportation, reducing operating costs, and ensuring navigation safety. However, existing reinforcement learning (RL)–based autopilot methods still face challenge...

Full description

Saved in:
Bibliographic Details
Main Authors: Hao Zhang, Jiawen Li, Liang Cao, Shucan Wang, Ronghui Li
Format: Article
Language:English
Published: Frontiers Media S.A. 2025-05-01
Series:Frontiers in Marine Science
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fmars.2025.1598380/full
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849716272803086336
author Hao Zhang
Jiawen Li
Jiawen Li
Jiawen Li
Jiawen Li
Liang Cao
Liang Cao
Liang Cao
Shucan Wang
Ronghui Li
Ronghui Li
Ronghui Li
author_facet Hao Zhang
Jiawen Li
Jiawen Li
Jiawen Li
Jiawen Li
Liang Cao
Liang Cao
Liang Cao
Shucan Wang
Ronghui Li
Ronghui Li
Ronghui Li
author_sort Hao Zhang
collection DOAJ
description With the global wave of intelligence and automation, ship autopilot technology has become the key to improving the efficiency of marine transportation, reducing operating costs, and ensuring navigation safety. However, existing reinforcement learning (RL)–based autopilot methods still face challenges such as low learning efficiency, redundant invalid exploration, and limited obstacle avoidance capability. To this end, this research proposes a GEPA model that integrates prior knowledge and hierarchical reward and punishment mechanisms to optimize the autopilot strategy for unmanned vessels based on deep Q-network (DQN). The GEPA model introduces a priori knowledge to guide the decision-making of the intelligent agent, reduces invalid explorations, and accelerates the learning convergence, and combines with hierarchical composite reward and punishment mechanisms to improve the rationality and safety of autopilot by means of end-point incentives, path-guided rewards, and irregular obstacle avoidance penalties. The experimental results show that the GEPA model outperforms the existing methods in terms of navigating efficiency, training convergence speed, path smoothness, obstacle avoidance ability and safety, with the number of training rounds to complete the task reduced by 24.85%, the path length reduced by up to about 70 pixels, the safety distance improved by 70.6%, and the number of collisions decreased significantly. The research in this paper provides an effective reinforcement learning optimization strategy for efficient and safe autonomous navigating of unmanned ships in complex marine environments, and can provide important theoretical support and practical guidance for the development of future intelligent ship technology.
format Article
id doaj-art-c3b55dba8f7a439aa439cbc2e0744c0f
institution DOAJ
issn 2296-7745
language English
publishDate 2025-05-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Marine Science
spelling doaj-art-c3b55dba8f7a439aa439cbc2e0744c0f2025-08-20T03:13:04ZengFrontiers Media S.A.Frontiers in Marine Science2296-77452025-05-011210.3389/fmars.2025.15983801598380Advancing ship automatic navigation strategy with prior knowledge and hierarchical penalty in irregular obstacles: a reinforcement learning approach to enhanced efficiency and safetyHao Zhang0Jiawen Li1Jiawen Li2Jiawen Li3Jiawen Li4Liang Cao5Liang Cao6Liang Cao7Shucan Wang8Ronghui Li9Ronghui Li10Ronghui Li11Naval Architecture and Shipping College, Guangdong Ocean University, Zhanjiang, ChinaNaval Architecture and Shipping College, Guangdong Ocean University, Zhanjiang, ChinaKey Laboratory of Philosophy and Social Science in Hainan Province of Hainan Free Trade Port International Shipping Development and Property Digitization, Hainan Vocational University of Science and Technology, Haikou, ChinaTechnical Research Center for Ship Intelligence and Safety Engineering of Guangdong, Zhanjiang, Guangdong, ChinaGuangdong Provincial Key Laboratory of Intelligent Equipment for South China Sea Marine Ranching, Zhanjiang, Guangdong, ChinaNaval Architecture and Shipping College, Guangdong Ocean University, Zhanjiang, ChinaTechnical Research Center for Ship Intelligence and Safety Engineering of Guangdong, Zhanjiang, Guangdong, ChinaGuangdong Provincial Key Laboratory of Intelligent Equipment for South China Sea Marine Ranching, Zhanjiang, Guangdong, ChinaNaval Architecture and Shipping College, Guangdong Ocean University, Zhanjiang, ChinaNaval Architecture and Shipping College, Guangdong Ocean University, Zhanjiang, ChinaTechnical Research Center for Ship Intelligence and Safety Engineering of Guangdong, Zhanjiang, Guangdong, ChinaGuangdong Provincial Key Laboratory of Intelligent Equipment for South China Sea Marine Ranching, Zhanjiang, Guangdong, ChinaWith the global wave of intelligence and automation, ship autopilot technology has become the key to improving the efficiency of marine transportation, reducing operating costs, and ensuring navigation safety. However, existing reinforcement learning (RL)–based autopilot methods still face challenges such as low learning efficiency, redundant invalid exploration, and limited obstacle avoidance capability. To this end, this research proposes a GEPA model that integrates prior knowledge and hierarchical reward and punishment mechanisms to optimize the autopilot strategy for unmanned vessels based on deep Q-network (DQN). The GEPA model introduces a priori knowledge to guide the decision-making of the intelligent agent, reduces invalid explorations, and accelerates the learning convergence, and combines with hierarchical composite reward and punishment mechanisms to improve the rationality and safety of autopilot by means of end-point incentives, path-guided rewards, and irregular obstacle avoidance penalties. The experimental results show that the GEPA model outperforms the existing methods in terms of navigating efficiency, training convergence speed, path smoothness, obstacle avoidance ability and safety, with the number of training rounds to complete the task reduced by 24.85%, the path length reduced by up to about 70 pixels, the safety distance improved by 70.6%, and the number of collisions decreased significantly. The research in this paper provides an effective reinforcement learning optimization strategy for efficient and safe autonomous navigating of unmanned ships in complex marine environments, and can provide important theoretical support and practical guidance for the development of future intelligent ship technology.https://www.frontiersin.org/articles/10.3389/fmars.2025.1598380/fulldeep reinforcement learningunmanned shipprior knowledgehierarchical composite reward and penaltiesirregular obstacle
spellingShingle Hao Zhang
Jiawen Li
Jiawen Li
Jiawen Li
Jiawen Li
Liang Cao
Liang Cao
Liang Cao
Shucan Wang
Ronghui Li
Ronghui Li
Ronghui Li
Advancing ship automatic navigation strategy with prior knowledge and hierarchical penalty in irregular obstacles: a reinforcement learning approach to enhanced efficiency and safety
Frontiers in Marine Science
deep reinforcement learning
unmanned ship
prior knowledge
hierarchical composite reward and penalties
irregular obstacle
title Advancing ship automatic navigation strategy with prior knowledge and hierarchical penalty in irregular obstacles: a reinforcement learning approach to enhanced efficiency and safety
title_full Advancing ship automatic navigation strategy with prior knowledge and hierarchical penalty in irregular obstacles: a reinforcement learning approach to enhanced efficiency and safety
title_fullStr Advancing ship automatic navigation strategy with prior knowledge and hierarchical penalty in irregular obstacles: a reinforcement learning approach to enhanced efficiency and safety
title_full_unstemmed Advancing ship automatic navigation strategy with prior knowledge and hierarchical penalty in irregular obstacles: a reinforcement learning approach to enhanced efficiency and safety
title_short Advancing ship automatic navigation strategy with prior knowledge and hierarchical penalty in irregular obstacles: a reinforcement learning approach to enhanced efficiency and safety
title_sort advancing ship automatic navigation strategy with prior knowledge and hierarchical penalty in irregular obstacles a reinforcement learning approach to enhanced efficiency and safety
topic deep reinforcement learning
unmanned ship
prior knowledge
hierarchical composite reward and penalties
irregular obstacle
url https://www.frontiersin.org/articles/10.3389/fmars.2025.1598380/full
work_keys_str_mv AT haozhang advancingshipautomaticnavigationstrategywithpriorknowledgeandhierarchicalpenaltyinirregularobstaclesareinforcementlearningapproachtoenhancedefficiencyandsafety
AT jiawenli advancingshipautomaticnavigationstrategywithpriorknowledgeandhierarchicalpenaltyinirregularobstaclesareinforcementlearningapproachtoenhancedefficiencyandsafety
AT jiawenli advancingshipautomaticnavigationstrategywithpriorknowledgeandhierarchicalpenaltyinirregularobstaclesareinforcementlearningapproachtoenhancedefficiencyandsafety
AT jiawenli advancingshipautomaticnavigationstrategywithpriorknowledgeandhierarchicalpenaltyinirregularobstaclesareinforcementlearningapproachtoenhancedefficiencyandsafety
AT jiawenli advancingshipautomaticnavigationstrategywithpriorknowledgeandhierarchicalpenaltyinirregularobstaclesareinforcementlearningapproachtoenhancedefficiencyandsafety
AT liangcao advancingshipautomaticnavigationstrategywithpriorknowledgeandhierarchicalpenaltyinirregularobstaclesareinforcementlearningapproachtoenhancedefficiencyandsafety
AT liangcao advancingshipautomaticnavigationstrategywithpriorknowledgeandhierarchicalpenaltyinirregularobstaclesareinforcementlearningapproachtoenhancedefficiencyandsafety
AT liangcao advancingshipautomaticnavigationstrategywithpriorknowledgeandhierarchicalpenaltyinirregularobstaclesareinforcementlearningapproachtoenhancedefficiencyandsafety
AT shucanwang advancingshipautomaticnavigationstrategywithpriorknowledgeandhierarchicalpenaltyinirregularobstaclesareinforcementlearningapproachtoenhancedefficiencyandsafety
AT ronghuili advancingshipautomaticnavigationstrategywithpriorknowledgeandhierarchicalpenaltyinirregularobstaclesareinforcementlearningapproachtoenhancedefficiencyandsafety
AT ronghuili advancingshipautomaticnavigationstrategywithpriorknowledgeandhierarchicalpenaltyinirregularobstaclesareinforcementlearningapproachtoenhancedefficiencyandsafety
AT ronghuili advancingshipautomaticnavigationstrategywithpriorknowledgeandhierarchicalpenaltyinirregularobstaclesareinforcementlearningapproachtoenhancedefficiencyandsafety