The CLIP - GPT Image Captioning Model Integrated with Global Semantics

Image captioning is a method for automatically generating language descriptions for images. Cross-modal semantic consistency is the core issue of shared subspace embedding when bridging pre-training models in the fields of computer vision and natural language processing to construct image captio...

Full description

Saved in:
Bibliographic Details
Main Authors: TAO Rui, REN Honge, CAO Haiyan
Format: Article
Language:zho
Published: Harbin University of Science and Technology Publications 2024-04-01
Series:Journal of Harbin University of Science and Technology
Subjects:
Online Access:https://hlgxb.hrbust.edu.cn/#/digest?ArticleID=2307
Tags: Add Tag
No Tags, Be the first to tag this record!