CuTCP: Custom Text Generation-based Class-aware Prompt Tuning for visual-language models

Abstract Visual-language models (VLMs) excel in cross-modal reasoning by synthesizing visual and linguistic features. Recent VLMs use prompt learning for fine-tuning, allowing adaptation to various downstream tasks. TCP applies class-aware prompt tuning to improve VLMs generalization, yet its relian...

Full description

Saved in:
Bibliographic Details
Main Authors: Min Huang, Chen Yang, Xiaoyan Yu
Format: Article
Language:English
Published: Nature Portfolio 2025-01-01
Series:Scientific Reports
Subjects:
Online Access:https://doi.org/10.1038/s41598-025-85838-x
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832585875768213504
author Min Huang
Chen Yang
Xiaoyan Yu
author_facet Min Huang
Chen Yang
Xiaoyan Yu
author_sort Min Huang
collection DOAJ
description Abstract Visual-language models (VLMs) excel in cross-modal reasoning by synthesizing visual and linguistic features. Recent VLMs use prompt learning for fine-tuning, allowing adaptation to various downstream tasks. TCP applies class-aware prompt tuning to improve VLMs generalization, yet its reliance on fixed text templates as prior knowledge can limit adaptability to fine-grained category distinctions. To address this, we propose Custom Text Generation-based Class-aware Prompt Tuning (CuTCP). CuTCP leverages large language models to generate descriptive, category-specific prompts, embedding richer semantic information that enhances the model’s ability to differentiate between known and unseen categories. Compared with TCP, CuTCP achieves an improvement of 0.74% on new classes and 0.44% on overall harmonic mean, averaged over 11 diverse image datasets. Experimental results demonstrate that CuTCP addresses the limitations of general prompt templates, significantly improving model adaptability and generalization capability, with particularly strong performance in fine-grained classification tasks.
format Article
id doaj-art-ca5b2a074dc145b0bcc7aecac7eb6f6f
institution Kabale University
issn 2045-2322
language English
publishDate 2025-01-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj-art-ca5b2a074dc145b0bcc7aecac7eb6f6f2025-01-26T12:26:18ZengNature PortfolioScientific Reports2045-23222025-01-0115111110.1038/s41598-025-85838-xCuTCP: Custom Text Generation-based Class-aware Prompt Tuning for visual-language modelsMin Huang0Chen Yang1Xiaoyan Yu2Zhengzhou University of Light IndustryZhengzhou University of Light IndustryZhengzhou University of Light IndustryAbstract Visual-language models (VLMs) excel in cross-modal reasoning by synthesizing visual and linguistic features. Recent VLMs use prompt learning for fine-tuning, allowing adaptation to various downstream tasks. TCP applies class-aware prompt tuning to improve VLMs generalization, yet its reliance on fixed text templates as prior knowledge can limit adaptability to fine-grained category distinctions. To address this, we propose Custom Text Generation-based Class-aware Prompt Tuning (CuTCP). CuTCP leverages large language models to generate descriptive, category-specific prompts, embedding richer semantic information that enhances the model’s ability to differentiate between known and unseen categories. Compared with TCP, CuTCP achieves an improvement of 0.74% on new classes and 0.44% on overall harmonic mean, averaged over 11 diverse image datasets. Experimental results demonstrate that CuTCP addresses the limitations of general prompt templates, significantly improving model adaptability and generalization capability, with particularly strong performance in fine-grained classification tasks.https://doi.org/10.1038/s41598-025-85838-xCLIPCuTCPPrompt learningTCPVLMs
spellingShingle Min Huang
Chen Yang
Xiaoyan Yu
CuTCP: Custom Text Generation-based Class-aware Prompt Tuning for visual-language models
Scientific Reports
CLIP
CuTCP
Prompt learning
TCP
VLMs
title CuTCP: Custom Text Generation-based Class-aware Prompt Tuning for visual-language models
title_full CuTCP: Custom Text Generation-based Class-aware Prompt Tuning for visual-language models
title_fullStr CuTCP: Custom Text Generation-based Class-aware Prompt Tuning for visual-language models
title_full_unstemmed CuTCP: Custom Text Generation-based Class-aware Prompt Tuning for visual-language models
title_short CuTCP: Custom Text Generation-based Class-aware Prompt Tuning for visual-language models
title_sort cutcp custom text generation based class aware prompt tuning for visual language models
topic CLIP
CuTCP
Prompt learning
TCP
VLMs
url https://doi.org/10.1038/s41598-025-85838-x
work_keys_str_mv AT minhuang cutcpcustomtextgenerationbasedclassawareprompttuningforvisuallanguagemodels
AT chenyang cutcpcustomtextgenerationbasedclassawareprompttuningforvisuallanguagemodels
AT xiaoyanyu cutcpcustomtextgenerationbasedclassawareprompttuningforvisuallanguagemodels