CuTCP: Custom Text Generation-based Class-aware Prompt Tuning for visual-language models
Abstract Visual-language models (VLMs) excel in cross-modal reasoning by synthesizing visual and linguistic features. Recent VLMs use prompt learning for fine-tuning, allowing adaptation to various downstream tasks. TCP applies class-aware prompt tuning to improve VLMs generalization, yet its relian...
Saved in:
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Nature Portfolio
2025-01-01
|
Series: | Scientific Reports |
Subjects: | |
Online Access: | https://doi.org/10.1038/s41598-025-85838-x |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Abstract Visual-language models (VLMs) excel in cross-modal reasoning by synthesizing visual and linguistic features. Recent VLMs use prompt learning for fine-tuning, allowing adaptation to various downstream tasks. TCP applies class-aware prompt tuning to improve VLMs generalization, yet its reliance on fixed text templates as prior knowledge can limit adaptability to fine-grained category distinctions. To address this, we propose Custom Text Generation-based Class-aware Prompt Tuning (CuTCP). CuTCP leverages large language models to generate descriptive, category-specific prompts, embedding richer semantic information that enhances the model’s ability to differentiate between known and unseen categories. Compared with TCP, CuTCP achieves an improvement of 0.74% on new classes and 0.44% on overall harmonic mean, averaged over 11 diverse image datasets. Experimental results demonstrate that CuTCP addresses the limitations of general prompt templates, significantly improving model adaptability and generalization capability, with particularly strong performance in fine-grained classification tasks. |
---|---|
ISSN: | 2045-2322 |