TIBW: Task-Independent Backdoor Watermarking with Fine-Tuning Resilience for Pre-Trained Language Models

Pre-trained language models such as BERT, GPT-3, and T5 have made significant advancements in natural language processing (NLP). However, their widespread adoption raises concerns about intellectual property (IP) protection, as unauthorized use can undermine innovation. Watermarking has emerged as a...

Full description

Saved in:

Bibliographic Details
Main Authors:	Weichuan Mo, Kongyang Chen, Yatie Xiao
Format:	Article
Language:	English
Published:	MDPI AG 2025-01-01
Series:	Mathematics
Subjects:	pre-trained language model backdoor watermarking fine-tuning
Online Access:	https://www.mdpi.com/2227-7390/13/2/272
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832588077602701312
author	Weichuan Mo Kongyang Chen Yatie Xiao
author_facet	Weichuan Mo Kongyang Chen Yatie Xiao
author_sort	Weichuan Mo
collection	DOAJ
description	Pre-trained language models such as BERT, GPT-3, and T5 have made significant advancements in natural language processing (NLP). However, their widespread adoption raises concerns about intellectual property (IP) protection, as unauthorized use can undermine innovation. Watermarking has emerged as a promising solution for model ownership verification, but its application to NLP models presents unique challenges, particularly in ensuring robustness against fine-tuning and preventing interference with downstream tasks. This paper presents a novel watermarking scheme, TIBW (Task-Independent Backdoor Watermarking), that embeds robust, task-independent backdoor watermarks into pre-trained language models. By implementing a Trigger–Target Word Pair Search Algorithm that selects trigger–target word pairs with maximal semantic dissimilarity, our approach ensures that the watermark remains effective even after extensive fine-tuning. Additionally, we introduce Parameter Relationship Embedding (PRE) to subtly modify the model’s embedding layer, reinforcing the association between trigger and target words without degrading the model performance. We also design a comprehensive watermark verification process that evaluates task behavior consistency, quantified by the Watermark Embedding Success Rate (WESR). Our experiments across five benchmark NLP tasks demonstrate that the proposed watermarking method maintains a near-baseline performance on clean inputs while achieving a high WESR, outperforming existing baselines in both robustness and stealthiness. Furthermore, the watermark persists reliably even after additional fine-tuning, highlighting its resilience against potential watermark removal attempts. This work provides a secure and reliable IP protection mechanism for NLP models, ensuring watermark integrity across diverse applications.
format	Article
id	doaj-art-9f7dc3cd58dd43af8d49942a22a5be1a
institution	Kabale University
issn	2227-7390
language	English
publishDate	2025-01-01
publisher	MDPI AG
record_format	Article
series	Mathematics
spelling	doaj-art-9f7dc3cd58dd43af8d49942a22a5be1a2025-01-24T13:39:59ZengMDPI AGMathematics2227-73902025-01-0113227210.3390/math13020272TIBW: Task-Independent Backdoor Watermarking with Fine-Tuning Resilience for Pre-Trained Language ModelsWeichuan Mo0Kongyang Chen1Yatie Xiao2School of Artificial Intelligence, Guangzhou University, Guangzhou 510006, ChinaSchool of Artificial Intelligence, Guangzhou University, Guangzhou 510006, ChinaSchool of Computer Science and Cyber Engineering, Guangzhou University, Guangzhou 510006, ChinaPre-trained language models such as BERT, GPT-3, and T5 have made significant advancements in natural language processing (NLP). However, their widespread adoption raises concerns about intellectual property (IP) protection, as unauthorized use can undermine innovation. Watermarking has emerged as a promising solution for model ownership verification, but its application to NLP models presents unique challenges, particularly in ensuring robustness against fine-tuning and preventing interference with downstream tasks. This paper presents a novel watermarking scheme, TIBW (Task-Independent Backdoor Watermarking), that embeds robust, task-independent backdoor watermarks into pre-trained language models. By implementing a Trigger–Target Word Pair Search Algorithm that selects trigger–target word pairs with maximal semantic dissimilarity, our approach ensures that the watermark remains effective even after extensive fine-tuning. Additionally, we introduce Parameter Relationship Embedding (PRE) to subtly modify the model’s embedding layer, reinforcing the association between trigger and target words without degrading the model performance. We also design a comprehensive watermark verification process that evaluates task behavior consistency, quantified by the Watermark Embedding Success Rate (WESR). Our experiments across five benchmark NLP tasks demonstrate that the proposed watermarking method maintains a near-baseline performance on clean inputs while achieving a high WESR, outperforming existing baselines in both robustness and stealthiness. Furthermore, the watermark persists reliably even after additional fine-tuning, highlighting its resilience against potential watermark removal attempts. This work provides a secure and reliable IP protection mechanism for NLP models, ensuring watermark integrity across diverse applications.https://www.mdpi.com/2227-7390/13/2/272pre-trained language modelbackdoorwatermarkingfine-tuning
spellingShingle	Weichuan Mo Kongyang Chen Yatie Xiao TIBW: Task-Independent Backdoor Watermarking with Fine-Tuning Resilience for Pre-Trained Language Models Mathematics pre-trained language model backdoor watermarking fine-tuning
title	TIBW: Task-Independent Backdoor Watermarking with Fine-Tuning Resilience for Pre-Trained Language Models
title_full	TIBW: Task-Independent Backdoor Watermarking with Fine-Tuning Resilience for Pre-Trained Language Models
title_fullStr	TIBW: Task-Independent Backdoor Watermarking with Fine-Tuning Resilience for Pre-Trained Language Models
title_full_unstemmed	TIBW: Task-Independent Backdoor Watermarking with Fine-Tuning Resilience for Pre-Trained Language Models
title_short	TIBW: Task-Independent Backdoor Watermarking with Fine-Tuning Resilience for Pre-Trained Language Models
title_sort	tibw task independent backdoor watermarking with fine tuning resilience for pre trained language models
topic	pre-trained language model backdoor watermarking fine-tuning
url	https://www.mdpi.com/2227-7390/13/2/272
work_keys_str_mv	AT weichuanmo tibwtaskindependentbackdoorwatermarkingwithfinetuningresilienceforpretrainedlanguagemodels AT kongyangchen tibwtaskindependentbackdoorwatermarkingwithfinetuningresilienceforpretrainedlanguagemodels AT yatiexiao tibwtaskindependentbackdoorwatermarkingwithfinetuningresilienceforpretrainedlanguagemodels

TIBW: Task-Independent Backdoor Watermarking with Fine-Tuning Resilience for Pre-Trained Language Models

Similar Items