CuTCP: Custom Text Generation-based Class-aware Prompt Tuning for visual-language models
Abstract Visual-language models (VLMs) excel in cross-modal reasoning by synthesizing visual and linguistic features. Recent VLMs use prompt learning for fine-tuning, allowing adaptation to various downstream tasks. TCP applies class-aware prompt tuning to improve VLMs generalization, yet its relian...
Saved in:
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Nature Portfolio
2025-01-01
|
Series: | Scientific Reports |
Subjects: | |
Online Access: | https://doi.org/10.1038/s41598-025-85838-x |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832585875768213504 |
---|---|
author | Min Huang Chen Yang Xiaoyan Yu |
author_facet | Min Huang Chen Yang Xiaoyan Yu |
author_sort | Min Huang |
collection | DOAJ |
description | Abstract Visual-language models (VLMs) excel in cross-modal reasoning by synthesizing visual and linguistic features. Recent VLMs use prompt learning for fine-tuning, allowing adaptation to various downstream tasks. TCP applies class-aware prompt tuning to improve VLMs generalization, yet its reliance on fixed text templates as prior knowledge can limit adaptability to fine-grained category distinctions. To address this, we propose Custom Text Generation-based Class-aware Prompt Tuning (CuTCP). CuTCP leverages large language models to generate descriptive, category-specific prompts, embedding richer semantic information that enhances the model’s ability to differentiate between known and unseen categories. Compared with TCP, CuTCP achieves an improvement of 0.74% on new classes and 0.44% on overall harmonic mean, averaged over 11 diverse image datasets. Experimental results demonstrate that CuTCP addresses the limitations of general prompt templates, significantly improving model adaptability and generalization capability, with particularly strong performance in fine-grained classification tasks. |
format | Article |
id | doaj-art-ca5b2a074dc145b0bcc7aecac7eb6f6f |
institution | Kabale University |
issn | 2045-2322 |
language | English |
publishDate | 2025-01-01 |
publisher | Nature Portfolio |
record_format | Article |
series | Scientific Reports |
spelling | doaj-art-ca5b2a074dc145b0bcc7aecac7eb6f6f2025-01-26T12:26:18ZengNature PortfolioScientific Reports2045-23222025-01-0115111110.1038/s41598-025-85838-xCuTCP: Custom Text Generation-based Class-aware Prompt Tuning for visual-language modelsMin Huang0Chen Yang1Xiaoyan Yu2Zhengzhou University of Light IndustryZhengzhou University of Light IndustryZhengzhou University of Light IndustryAbstract Visual-language models (VLMs) excel in cross-modal reasoning by synthesizing visual and linguistic features. Recent VLMs use prompt learning for fine-tuning, allowing adaptation to various downstream tasks. TCP applies class-aware prompt tuning to improve VLMs generalization, yet its reliance on fixed text templates as prior knowledge can limit adaptability to fine-grained category distinctions. To address this, we propose Custom Text Generation-based Class-aware Prompt Tuning (CuTCP). CuTCP leverages large language models to generate descriptive, category-specific prompts, embedding richer semantic information that enhances the model’s ability to differentiate between known and unseen categories. Compared with TCP, CuTCP achieves an improvement of 0.74% on new classes and 0.44% on overall harmonic mean, averaged over 11 diverse image datasets. Experimental results demonstrate that CuTCP addresses the limitations of general prompt templates, significantly improving model adaptability and generalization capability, with particularly strong performance in fine-grained classification tasks.https://doi.org/10.1038/s41598-025-85838-xCLIPCuTCPPrompt learningTCPVLMs |
spellingShingle | Min Huang Chen Yang Xiaoyan Yu CuTCP: Custom Text Generation-based Class-aware Prompt Tuning for visual-language models Scientific Reports CLIP CuTCP Prompt learning TCP VLMs |
title | CuTCP: Custom Text Generation-based Class-aware Prompt Tuning for visual-language models |
title_full | CuTCP: Custom Text Generation-based Class-aware Prompt Tuning for visual-language models |
title_fullStr | CuTCP: Custom Text Generation-based Class-aware Prompt Tuning for visual-language models |
title_full_unstemmed | CuTCP: Custom Text Generation-based Class-aware Prompt Tuning for visual-language models |
title_short | CuTCP: Custom Text Generation-based Class-aware Prompt Tuning for visual-language models |
title_sort | cutcp custom text generation based class aware prompt tuning for visual language models |
topic | CLIP CuTCP Prompt learning TCP VLMs |
url | https://doi.org/10.1038/s41598-025-85838-x |
work_keys_str_mv | AT minhuang cutcpcustomtextgenerationbasedclassawareprompttuningforvisuallanguagemodels AT chenyang cutcpcustomtextgenerationbasedclassawareprompttuningforvisuallanguagemodels AT xiaoyanyu cutcpcustomtextgenerationbasedclassawareprompttuningforvisuallanguagemodels |