CuTCP: Custom Text Generation-based Class-aware Prompt Tuning for visual-language models

Abstract Visual-language models (VLMs) excel in cross-modal reasoning by synthesizing visual and linguistic features. Recent VLMs use prompt learning for fine-tuning, allowing adaptation to various downstream tasks. TCP applies class-aware prompt tuning to improve VLMs generalization, yet its relian...

Full description

Saved in:
Bibliographic Details
Main Authors: Min Huang, Chen Yang, Xiaoyan Yu
Format: Article
Language:English
Published: Nature Portfolio 2025-01-01
Series:Scientific Reports
Subjects:
Online Access:https://doi.org/10.1038/s41598-025-85838-x
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract Visual-language models (VLMs) excel in cross-modal reasoning by synthesizing visual and linguistic features. Recent VLMs use prompt learning for fine-tuning, allowing adaptation to various downstream tasks. TCP applies class-aware prompt tuning to improve VLMs generalization, yet its reliance on fixed text templates as prior knowledge can limit adaptability to fine-grained category distinctions. To address this, we propose Custom Text Generation-based Class-aware Prompt Tuning (CuTCP). CuTCP leverages large language models to generate descriptive, category-specific prompts, embedding richer semantic information that enhances the model’s ability to differentiate between known and unseen categories. Compared with TCP, CuTCP achieves an improvement of 0.74% on new classes and 0.44% on overall harmonic mean, averaged over 11 diverse image datasets. Experimental results demonstrate that CuTCP addresses the limitations of general prompt templates, significantly improving model adaptability and generalization capability, with particularly strong performance in fine-grained classification tasks.
ISSN:2045-2322