Advancing ship automatic navigation strategy with prior knowledge and hierarchical penalty in irregular obstacles: a reinforcement learning approach to enhanced efficiency and safety
With the global wave of intelligence and automation, ship autopilot technology has become the key to improving the efficiency of marine transportation, reducing operating costs, and ensuring navigation safety. However, existing reinforcement learning (RL)–based autopilot methods still face challenge...
Saved in:
| Main Authors: | , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Frontiers Media S.A.
2025-05-01
|
| Series: | Frontiers in Marine Science |
| Subjects: | |
| Online Access: | https://www.frontiersin.org/articles/10.3389/fmars.2025.1598380/full |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849716272803086336 |
|---|---|
| author | Hao Zhang Jiawen Li Jiawen Li Jiawen Li Jiawen Li Liang Cao Liang Cao Liang Cao Shucan Wang Ronghui Li Ronghui Li Ronghui Li |
| author_facet | Hao Zhang Jiawen Li Jiawen Li Jiawen Li Jiawen Li Liang Cao Liang Cao Liang Cao Shucan Wang Ronghui Li Ronghui Li Ronghui Li |
| author_sort | Hao Zhang |
| collection | DOAJ |
| description | With the global wave of intelligence and automation, ship autopilot technology has become the key to improving the efficiency of marine transportation, reducing operating costs, and ensuring navigation safety. However, existing reinforcement learning (RL)–based autopilot methods still face challenges such as low learning efficiency, redundant invalid exploration, and limited obstacle avoidance capability. To this end, this research proposes a GEPA model that integrates prior knowledge and hierarchical reward and punishment mechanisms to optimize the autopilot strategy for unmanned vessels based on deep Q-network (DQN). The GEPA model introduces a priori knowledge to guide the decision-making of the intelligent agent, reduces invalid explorations, and accelerates the learning convergence, and combines with hierarchical composite reward and punishment mechanisms to improve the rationality and safety of autopilot by means of end-point incentives, path-guided rewards, and irregular obstacle avoidance penalties. The experimental results show that the GEPA model outperforms the existing methods in terms of navigating efficiency, training convergence speed, path smoothness, obstacle avoidance ability and safety, with the number of training rounds to complete the task reduced by 24.85%, the path length reduced by up to about 70 pixels, the safety distance improved by 70.6%, and the number of collisions decreased significantly. The research in this paper provides an effective reinforcement learning optimization strategy for efficient and safe autonomous navigating of unmanned ships in complex marine environments, and can provide important theoretical support and practical guidance for the development of future intelligent ship technology. |
| format | Article |
| id | doaj-art-c3b55dba8f7a439aa439cbc2e0744c0f |
| institution | DOAJ |
| issn | 2296-7745 |
| language | English |
| publishDate | 2025-05-01 |
| publisher | Frontiers Media S.A. |
| record_format | Article |
| series | Frontiers in Marine Science |
| spelling | doaj-art-c3b55dba8f7a439aa439cbc2e0744c0f2025-08-20T03:13:04ZengFrontiers Media S.A.Frontiers in Marine Science2296-77452025-05-011210.3389/fmars.2025.15983801598380Advancing ship automatic navigation strategy with prior knowledge and hierarchical penalty in irregular obstacles: a reinforcement learning approach to enhanced efficiency and safetyHao Zhang0Jiawen Li1Jiawen Li2Jiawen Li3Jiawen Li4Liang Cao5Liang Cao6Liang Cao7Shucan Wang8Ronghui Li9Ronghui Li10Ronghui Li11Naval Architecture and Shipping College, Guangdong Ocean University, Zhanjiang, ChinaNaval Architecture and Shipping College, Guangdong Ocean University, Zhanjiang, ChinaKey Laboratory of Philosophy and Social Science in Hainan Province of Hainan Free Trade Port International Shipping Development and Property Digitization, Hainan Vocational University of Science and Technology, Haikou, ChinaTechnical Research Center for Ship Intelligence and Safety Engineering of Guangdong, Zhanjiang, Guangdong, ChinaGuangdong Provincial Key Laboratory of Intelligent Equipment for South China Sea Marine Ranching, Zhanjiang, Guangdong, ChinaNaval Architecture and Shipping College, Guangdong Ocean University, Zhanjiang, ChinaTechnical Research Center for Ship Intelligence and Safety Engineering of Guangdong, Zhanjiang, Guangdong, ChinaGuangdong Provincial Key Laboratory of Intelligent Equipment for South China Sea Marine Ranching, Zhanjiang, Guangdong, ChinaNaval Architecture and Shipping College, Guangdong Ocean University, Zhanjiang, ChinaNaval Architecture and Shipping College, Guangdong Ocean University, Zhanjiang, ChinaTechnical Research Center for Ship Intelligence and Safety Engineering of Guangdong, Zhanjiang, Guangdong, ChinaGuangdong Provincial Key Laboratory of Intelligent Equipment for South China Sea Marine Ranching, Zhanjiang, Guangdong, ChinaWith the global wave of intelligence and automation, ship autopilot technology has become the key to improving the efficiency of marine transportation, reducing operating costs, and ensuring navigation safety. However, existing reinforcement learning (RL)–based autopilot methods still face challenges such as low learning efficiency, redundant invalid exploration, and limited obstacle avoidance capability. To this end, this research proposes a GEPA model that integrates prior knowledge and hierarchical reward and punishment mechanisms to optimize the autopilot strategy for unmanned vessels based on deep Q-network (DQN). The GEPA model introduces a priori knowledge to guide the decision-making of the intelligent agent, reduces invalid explorations, and accelerates the learning convergence, and combines with hierarchical composite reward and punishment mechanisms to improve the rationality and safety of autopilot by means of end-point incentives, path-guided rewards, and irregular obstacle avoidance penalties. The experimental results show that the GEPA model outperforms the existing methods in terms of navigating efficiency, training convergence speed, path smoothness, obstacle avoidance ability and safety, with the number of training rounds to complete the task reduced by 24.85%, the path length reduced by up to about 70 pixels, the safety distance improved by 70.6%, and the number of collisions decreased significantly. The research in this paper provides an effective reinforcement learning optimization strategy for efficient and safe autonomous navigating of unmanned ships in complex marine environments, and can provide important theoretical support and practical guidance for the development of future intelligent ship technology.https://www.frontiersin.org/articles/10.3389/fmars.2025.1598380/fulldeep reinforcement learningunmanned shipprior knowledgehierarchical composite reward and penaltiesirregular obstacle |
| spellingShingle | Hao Zhang Jiawen Li Jiawen Li Jiawen Li Jiawen Li Liang Cao Liang Cao Liang Cao Shucan Wang Ronghui Li Ronghui Li Ronghui Li Advancing ship automatic navigation strategy with prior knowledge and hierarchical penalty in irregular obstacles: a reinforcement learning approach to enhanced efficiency and safety Frontiers in Marine Science deep reinforcement learning unmanned ship prior knowledge hierarchical composite reward and penalties irregular obstacle |
| title | Advancing ship automatic navigation strategy with prior knowledge and hierarchical penalty in irregular obstacles: a reinforcement learning approach to enhanced efficiency and safety |
| title_full | Advancing ship automatic navigation strategy with prior knowledge and hierarchical penalty in irregular obstacles: a reinforcement learning approach to enhanced efficiency and safety |
| title_fullStr | Advancing ship automatic navigation strategy with prior knowledge and hierarchical penalty in irregular obstacles: a reinforcement learning approach to enhanced efficiency and safety |
| title_full_unstemmed | Advancing ship automatic navigation strategy with prior knowledge and hierarchical penalty in irregular obstacles: a reinforcement learning approach to enhanced efficiency and safety |
| title_short | Advancing ship automatic navigation strategy with prior knowledge and hierarchical penalty in irregular obstacles: a reinforcement learning approach to enhanced efficiency and safety |
| title_sort | advancing ship automatic navigation strategy with prior knowledge and hierarchical penalty in irregular obstacles a reinforcement learning approach to enhanced efficiency and safety |
| topic | deep reinforcement learning unmanned ship prior knowledge hierarchical composite reward and penalties irregular obstacle |
| url | https://www.frontiersin.org/articles/10.3389/fmars.2025.1598380/full |
| work_keys_str_mv | AT haozhang advancingshipautomaticnavigationstrategywithpriorknowledgeandhierarchicalpenaltyinirregularobstaclesareinforcementlearningapproachtoenhancedefficiencyandsafety AT jiawenli advancingshipautomaticnavigationstrategywithpriorknowledgeandhierarchicalpenaltyinirregularobstaclesareinforcementlearningapproachtoenhancedefficiencyandsafety AT jiawenli advancingshipautomaticnavigationstrategywithpriorknowledgeandhierarchicalpenaltyinirregularobstaclesareinforcementlearningapproachtoenhancedefficiencyandsafety AT jiawenli advancingshipautomaticnavigationstrategywithpriorknowledgeandhierarchicalpenaltyinirregularobstaclesareinforcementlearningapproachtoenhancedefficiencyandsafety AT jiawenli advancingshipautomaticnavigationstrategywithpriorknowledgeandhierarchicalpenaltyinirregularobstaclesareinforcementlearningapproachtoenhancedefficiencyandsafety AT liangcao advancingshipautomaticnavigationstrategywithpriorknowledgeandhierarchicalpenaltyinirregularobstaclesareinforcementlearningapproachtoenhancedefficiencyandsafety AT liangcao advancingshipautomaticnavigationstrategywithpriorknowledgeandhierarchicalpenaltyinirregularobstaclesareinforcementlearningapproachtoenhancedefficiencyandsafety AT liangcao advancingshipautomaticnavigationstrategywithpriorknowledgeandhierarchicalpenaltyinirregularobstaclesareinforcementlearningapproachtoenhancedefficiencyandsafety AT shucanwang advancingshipautomaticnavigationstrategywithpriorknowledgeandhierarchicalpenaltyinirregularobstaclesareinforcementlearningapproachtoenhancedefficiencyandsafety AT ronghuili advancingshipautomaticnavigationstrategywithpriorknowledgeandhierarchicalpenaltyinirregularobstaclesareinforcementlearningapproachtoenhancedefficiencyandsafety AT ronghuili advancingshipautomaticnavigationstrategywithpriorknowledgeandhierarchicalpenaltyinirregularobstaclesareinforcementlearningapproachtoenhancedefficiencyandsafety AT ronghuili advancingshipautomaticnavigationstrategywithpriorknowledgeandhierarchicalpenaltyinirregularobstaclesareinforcementlearningapproachtoenhancedefficiencyandsafety |