TIBW: Task-Independent Backdoor Watermarking with Fine-Tuning Resilience for Pre-Trained Language Models
Pre-trained language models such as BERT, GPT-3, and T5 have made significant advancements in natural language processing (NLP). However, their widespread adoption raises concerns about intellectual property (IP) protection, as unauthorized use can undermine innovation. Watermarking has emerged as a...
Saved in:
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2025-01-01
|
Series: | Mathematics |
Subjects: | |
Online Access: | https://www.mdpi.com/2227-7390/13/2/272 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832588077602701312 |
---|---|
author | Weichuan Mo Kongyang Chen Yatie Xiao |
author_facet | Weichuan Mo Kongyang Chen Yatie Xiao |
author_sort | Weichuan Mo |
collection | DOAJ |
description | Pre-trained language models such as BERT, GPT-3, and T5 have made significant advancements in natural language processing (NLP). However, their widespread adoption raises concerns about intellectual property (IP) protection, as unauthorized use can undermine innovation. Watermarking has emerged as a promising solution for model ownership verification, but its application to NLP models presents unique challenges, particularly in ensuring robustness against fine-tuning and preventing interference with downstream tasks. This paper presents a novel watermarking scheme, TIBW (Task-Independent Backdoor Watermarking), that embeds robust, task-independent backdoor watermarks into pre-trained language models. By implementing a Trigger–Target Word Pair Search Algorithm that selects trigger–target word pairs with maximal semantic dissimilarity, our approach ensures that the watermark remains effective even after extensive fine-tuning. Additionally, we introduce Parameter Relationship Embedding (PRE) to subtly modify the model’s embedding layer, reinforcing the association between trigger and target words without degrading the model performance. We also design a comprehensive watermark verification process that evaluates task behavior consistency, quantified by the Watermark Embedding Success Rate (WESR). Our experiments across five benchmark NLP tasks demonstrate that the proposed watermarking method maintains a near-baseline performance on clean inputs while achieving a high WESR, outperforming existing baselines in both robustness and stealthiness. Furthermore, the watermark persists reliably even after additional fine-tuning, highlighting its resilience against potential watermark removal attempts. This work provides a secure and reliable IP protection mechanism for NLP models, ensuring watermark integrity across diverse applications. |
format | Article |
id | doaj-art-9f7dc3cd58dd43af8d49942a22a5be1a |
institution | Kabale University |
issn | 2227-7390 |
language | English |
publishDate | 2025-01-01 |
publisher | MDPI AG |
record_format | Article |
series | Mathematics |
spelling | doaj-art-9f7dc3cd58dd43af8d49942a22a5be1a2025-01-24T13:39:59ZengMDPI AGMathematics2227-73902025-01-0113227210.3390/math13020272TIBW: Task-Independent Backdoor Watermarking with Fine-Tuning Resilience for Pre-Trained Language ModelsWeichuan Mo0Kongyang Chen1Yatie Xiao2School of Artificial Intelligence, Guangzhou University, Guangzhou 510006, ChinaSchool of Artificial Intelligence, Guangzhou University, Guangzhou 510006, ChinaSchool of Computer Science and Cyber Engineering, Guangzhou University, Guangzhou 510006, ChinaPre-trained language models such as BERT, GPT-3, and T5 have made significant advancements in natural language processing (NLP). However, their widespread adoption raises concerns about intellectual property (IP) protection, as unauthorized use can undermine innovation. Watermarking has emerged as a promising solution for model ownership verification, but its application to NLP models presents unique challenges, particularly in ensuring robustness against fine-tuning and preventing interference with downstream tasks. This paper presents a novel watermarking scheme, TIBW (Task-Independent Backdoor Watermarking), that embeds robust, task-independent backdoor watermarks into pre-trained language models. By implementing a Trigger–Target Word Pair Search Algorithm that selects trigger–target word pairs with maximal semantic dissimilarity, our approach ensures that the watermark remains effective even after extensive fine-tuning. Additionally, we introduce Parameter Relationship Embedding (PRE) to subtly modify the model’s embedding layer, reinforcing the association between trigger and target words without degrading the model performance. We also design a comprehensive watermark verification process that evaluates task behavior consistency, quantified by the Watermark Embedding Success Rate (WESR). Our experiments across five benchmark NLP tasks demonstrate that the proposed watermarking method maintains a near-baseline performance on clean inputs while achieving a high WESR, outperforming existing baselines in both robustness and stealthiness. Furthermore, the watermark persists reliably even after additional fine-tuning, highlighting its resilience against potential watermark removal attempts. This work provides a secure and reliable IP protection mechanism for NLP models, ensuring watermark integrity across diverse applications.https://www.mdpi.com/2227-7390/13/2/272pre-trained language modelbackdoorwatermarkingfine-tuning |
spellingShingle | Weichuan Mo Kongyang Chen Yatie Xiao TIBW: Task-Independent Backdoor Watermarking with Fine-Tuning Resilience for Pre-Trained Language Models Mathematics pre-trained language model backdoor watermarking fine-tuning |
title | TIBW: Task-Independent Backdoor Watermarking with Fine-Tuning Resilience for Pre-Trained Language Models |
title_full | TIBW: Task-Independent Backdoor Watermarking with Fine-Tuning Resilience for Pre-Trained Language Models |
title_fullStr | TIBW: Task-Independent Backdoor Watermarking with Fine-Tuning Resilience for Pre-Trained Language Models |
title_full_unstemmed | TIBW: Task-Independent Backdoor Watermarking with Fine-Tuning Resilience for Pre-Trained Language Models |
title_short | TIBW: Task-Independent Backdoor Watermarking with Fine-Tuning Resilience for Pre-Trained Language Models |
title_sort | tibw task independent backdoor watermarking with fine tuning resilience for pre trained language models |
topic | pre-trained language model backdoor watermarking fine-tuning |
url | https://www.mdpi.com/2227-7390/13/2/272 |
work_keys_str_mv | AT weichuanmo tibwtaskindependentbackdoorwatermarkingwithfinetuningresilienceforpretrainedlanguagemodels AT kongyangchen tibwtaskindependentbackdoorwatermarkingwithfinetuningresilienceforpretrainedlanguagemodels AT yatiexiao tibwtaskindependentbackdoorwatermarkingwithfinetuningresilienceforpretrainedlanguagemodels |