Research on Ship Heave Motion Compensation Control Under Complex Sea State Environment Based on Improved Reinforcement Learning

ObjectiveIn the vast expanse of the boundless sea, the capricious and ever‒shifting interplay of wind and waves often presents unpredictable challenges to maritime operations. Particularly in the midst of an ever‒changing marine environment, ships frequently encounter powerful gusts and tumultuous s...

Full description

Saved in:
Bibliographic Details
Main Authors: ZHANG Qin, ZHOU Jingyi, WANG Xingyue, HU Xiong
Format: Article
Language:English
Published: Editorial Department of Journal of Sichuan University (Engineering Science Edition) 2025-07-01
Series:工程科学与技术
Subjects:
Online Access:http://jsuese.scu.edu.cn/thesisDetails#10.12454/j.jsuese.202301015
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:ObjectiveIn the vast expanse of the boundless sea, the capricious and ever‒shifting interplay of wind and waves often presents unpredictable challenges to maritime operations. Particularly in the midst of an ever‒changing marine environment, ships frequently encounter powerful gusts and tumultuous swells, whose restless and complex movements not only pose a significant threat to the secure installation of offshore wind turbine units but also introduce considerable uncertainty to maritime operations and personnel transfers. These destabilizing elements result in operational delays, equipment damage, or even harm to personnel, necessitating utmost emphasis on dependability, safety, and stability in offshore operations. Thus, in the quest to address these concerns and bolster the efficiency and safety of maritime endeavors, researchers actively explore and pioneer diverse techniques aimed at compensating for the vertical motion of vessels. The underlying objective of these techniques lies in precisely governing vessel movements and counteracting heave provoked by wind and waves, ensuring the steadfastness and security of offshore operations. However, despite the immense potential and value that this technology holds, it encounters significant challenges in practical application. The inherent complexity and inscrutability of vessel systems introduce obstacles in modeling and control. In addition, the ability to swiftly and accurately adjust compensation strategies during actual operations to accommodate ever-changing oceanic conditions remains an exigent conundrum in need of resolution. Therefore, this study presents a compensation control method for ship heave under complex sea conditions using an improved reinforcement learning approach.MethodsThis novel method imparts fresh insights into addressing heavy compensation in offshore operations and heralds a new trajectory for the evolution of future offshore operation technologies. The study employs principles of mechanics to furnish a comprehensive model of the wave compensation system, encompassing servo drives, servo motors, encoders, and hydraulic cylinders. This model serves a dual purpose: it simulates various performance indicators of the vessel heave compensation system and functions as the training environment for reinforcement learning. With the mechanical model of the vessel heave compensation system firmly established, the study applies the Markov decision process to determine the agent’s strategy and reward mechanism. Within this process, the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm assumes a central role as the core control strategy. The TD3 algorithm approximates the value function and policy by harnessing deep neural networks, equipping it to tackle complex and nonlinear sea condition challenges. Aligned with the uncertainty and complexity entailed by the maritime milieu, this study specifically fine‒tunes the output layer of the Actor network by amplifying the amplitude of the TanH function. This adjustment endows the Actor network with the ability to generate more versatile and extensive control actions, adeptly adapting to the capriciousness of the sea. During the training process, the study employs two independent network structures, the main network and the target network, each comprising an Actor and a Critic network, amounting to a total of six networks. Through iterative updates of these networks, the system continually learns and optimizes its control strategies, culminating in the generation of self-learning optimal control actions. The study incorporates Ornstein‒Uhlenbeck (OU) action noise into the target policy to enhance the adaptability of the agent amidst complex sea conditions. OU noise is a specialized form of stochastic process that engenders smooth and correlated random oscillations over continuous time, making it particularly suited for exploration within continuous state spaces. In reinforcement learning environments, the inclusion of OU noise aids the agent in broader exploration during the nascent stages of training, facilitating the discovery of potentially advantageous state-action pairs that augment task completion. In addition, the study devises a reward function that integrates linear and Gaussian components to guide the agent's learning and decision-making processes. This composite reward function not only reflects the quality of current state‒action pairs but also incorporates predictions and evaluations of future states. This design augments the agent’s understanding of task objectives and enables it to formulate effective strategies during protracted learning processes. By adopting this approach, the agent gradually adapts to the demands of reinforcement learning tasks amid dynamically shifting sea conditions, evading the pitfalls of local optima. Even in the face of variable and complicated sea conditions, the agent continually optimizes its compensation strategies through self‒learning and adaptive adjustments, heightening accuracy in compensation and assuring the secure installation of offshore wind turbine units. In turn, this fortifies the bastion of offshore operations and safeguards personnel transfers.Results and DiscussionsSimulation experiments demonstrate the outstanding effectiveness of the improved TD3 algorithm in compensation control when confronted with adverse and complex sea conditions. The study applies the trained model to a simulated vessel heave compensation system, subjecting it to a range of complex sea conditions, spanning sea states classified from level three to level six, as well as varying marine environments. In these diversified test scenarios, the improved TD3 algorithm exhibits remarkable adaptability and stability. Particularly noteworthy is its exceptional compensation efficiency, attaining a maximum of 99.95%. This accomplishment highlights the algorithm’s superb compensation control capabilities, furnishing a high degree of safety to the installation of offshore wind turbine units. This algorithm surpasses step control methods optimized through particle swarm optimization and outperforms traditional TD3 reinforcement learning methodologies. In addition, the improved TD3 algorithm boasts favorable generalization capabilities, indicative of its capacity to swiftly adapt and generate effective compensation control strategies even in untrained and novel sea conditions.ConclusionsTherefore, the improved TD3 algorithm opens up vast potential and application value in the field of vessel heave compensation, furnishing robust technical support to the installation of offshore wind turbine units and the safety of offshore operations. Through its complex melding of mechanics, reinforcement learning, and innovative control strategies, this study advances the creation of an advanced and dependable system for maritime operations, bound to reshape the landscape of offshore endeavors.
ISSN:2096-3246