TIBW: Task-Independent Backdoor Watermarking with Fine-Tuning Resilience for Pre-Trained Language Models

Pre-trained language models such as BERT, GPT-3, and T5 have made significant advancements in natural language processing (NLP). However, their widespread adoption raises concerns about intellectual property (IP) protection, as unauthorized use can undermine innovation. Watermarking has emerged as a...

Full description

Saved in:
Bibliographic Details
Main Authors: Weichuan Mo, Kongyang Chen, Yatie Xiao
Format: Article
Language:English
Published: MDPI AG 2025-01-01
Series:Mathematics
Subjects:
Online Access:https://www.mdpi.com/2227-7390/13/2/272
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832588077602701312
author Weichuan Mo
Kongyang Chen
Yatie Xiao
author_facet Weichuan Mo
Kongyang Chen
Yatie Xiao
author_sort Weichuan Mo
collection DOAJ
description Pre-trained language models such as BERT, GPT-3, and T5 have made significant advancements in natural language processing (NLP). However, their widespread adoption raises concerns about intellectual property (IP) protection, as unauthorized use can undermine innovation. Watermarking has emerged as a promising solution for model ownership verification, but its application to NLP models presents unique challenges, particularly in ensuring robustness against fine-tuning and preventing interference with downstream tasks. This paper presents a novel watermarking scheme, TIBW (Task-Independent Backdoor Watermarking), that embeds robust, task-independent backdoor watermarks into pre-trained language models. By implementing a Trigger–Target Word Pair Search Algorithm that selects trigger–target word pairs with maximal semantic dissimilarity, our approach ensures that the watermark remains effective even after extensive fine-tuning. Additionally, we introduce Parameter Relationship Embedding (PRE) to subtly modify the model’s embedding layer, reinforcing the association between trigger and target words without degrading the model performance. We also design a comprehensive watermark verification process that evaluates task behavior consistency, quantified by the Watermark Embedding Success Rate (WESR). Our experiments across five benchmark NLP tasks demonstrate that the proposed watermarking method maintains a near-baseline performance on clean inputs while achieving a high WESR, outperforming existing baselines in both robustness and stealthiness. Furthermore, the watermark persists reliably even after additional fine-tuning, highlighting its resilience against potential watermark removal attempts. This work provides a secure and reliable IP protection mechanism for NLP models, ensuring watermark integrity across diverse applications.
format Article
id doaj-art-9f7dc3cd58dd43af8d49942a22a5be1a
institution Kabale University
issn 2227-7390
language English
publishDate 2025-01-01
publisher MDPI AG
record_format Article
series Mathematics
spelling doaj-art-9f7dc3cd58dd43af8d49942a22a5be1a2025-01-24T13:39:59ZengMDPI AGMathematics2227-73902025-01-0113227210.3390/math13020272TIBW: Task-Independent Backdoor Watermarking with Fine-Tuning Resilience for Pre-Trained Language ModelsWeichuan Mo0Kongyang Chen1Yatie Xiao2School of Artificial Intelligence, Guangzhou University, Guangzhou 510006, ChinaSchool of Artificial Intelligence, Guangzhou University, Guangzhou 510006, ChinaSchool of Computer Science and Cyber Engineering, Guangzhou University, Guangzhou 510006, ChinaPre-trained language models such as BERT, GPT-3, and T5 have made significant advancements in natural language processing (NLP). However, their widespread adoption raises concerns about intellectual property (IP) protection, as unauthorized use can undermine innovation. Watermarking has emerged as a promising solution for model ownership verification, but its application to NLP models presents unique challenges, particularly in ensuring robustness against fine-tuning and preventing interference with downstream tasks. This paper presents a novel watermarking scheme, TIBW (Task-Independent Backdoor Watermarking), that embeds robust, task-independent backdoor watermarks into pre-trained language models. By implementing a Trigger–Target Word Pair Search Algorithm that selects trigger–target word pairs with maximal semantic dissimilarity, our approach ensures that the watermark remains effective even after extensive fine-tuning. Additionally, we introduce Parameter Relationship Embedding (PRE) to subtly modify the model’s embedding layer, reinforcing the association between trigger and target words without degrading the model performance. We also design a comprehensive watermark verification process that evaluates task behavior consistency, quantified by the Watermark Embedding Success Rate (WESR). Our experiments across five benchmark NLP tasks demonstrate that the proposed watermarking method maintains a near-baseline performance on clean inputs while achieving a high WESR, outperforming existing baselines in both robustness and stealthiness. Furthermore, the watermark persists reliably even after additional fine-tuning, highlighting its resilience against potential watermark removal attempts. This work provides a secure and reliable IP protection mechanism for NLP models, ensuring watermark integrity across diverse applications.https://www.mdpi.com/2227-7390/13/2/272pre-trained language modelbackdoorwatermarkingfine-tuning
spellingShingle Weichuan Mo
Kongyang Chen
Yatie Xiao
TIBW: Task-Independent Backdoor Watermarking with Fine-Tuning Resilience for Pre-Trained Language Models
Mathematics
pre-trained language model
backdoor
watermarking
fine-tuning
title TIBW: Task-Independent Backdoor Watermarking with Fine-Tuning Resilience for Pre-Trained Language Models
title_full TIBW: Task-Independent Backdoor Watermarking with Fine-Tuning Resilience for Pre-Trained Language Models
title_fullStr TIBW: Task-Independent Backdoor Watermarking with Fine-Tuning Resilience for Pre-Trained Language Models
title_full_unstemmed TIBW: Task-Independent Backdoor Watermarking with Fine-Tuning Resilience for Pre-Trained Language Models
title_short TIBW: Task-Independent Backdoor Watermarking with Fine-Tuning Resilience for Pre-Trained Language Models
title_sort tibw task independent backdoor watermarking with fine tuning resilience for pre trained language models
topic pre-trained language model
backdoor
watermarking
fine-tuning
url https://www.mdpi.com/2227-7390/13/2/272
work_keys_str_mv AT weichuanmo tibwtaskindependentbackdoorwatermarkingwithfinetuningresilienceforpretrainedlanguagemodels
AT kongyangchen tibwtaskindependentbackdoorwatermarkingwithfinetuningresilienceforpretrainedlanguagemodels
AT yatiexiao tibwtaskindependentbackdoorwatermarkingwithfinetuningresilienceforpretrainedlanguagemodels