The CLIP - GPT Image Captioning Model Integrated with Global Semantics

Image captioning is a method for automatically generating language descriptions for images. Cross-modal semantic consistency is the core issue of shared subspace embedding when bridging pre-training models in the fields of computer vision and natural language processing to construct image captio...

Full description

Saved in:

Bibliographic Details
Main Authors:	TAO Rui, REN Honge, CAO Haiyan
Format:	Article
Language:	zho
Published:	Harbin University of Science and Technology Publications 2024-04-01
Series:	Journal of Harbin University of Science and Technology
Subjects:	cross-modal image captioning pre-training model shared subspace semantic alignment
Online Access:	https://hlgxb.hrbust.edu.cn/#/digest?ArticleID=2307
Tags:	Add Tag No Tags, Be the first to tag this record!

Internet

https://hlgxb.hrbust.edu.cn/#/digest?ArticleID=2307

The CLIP - GPT Image Captioning Model Integrated with Global Semantics

Internet

Similar Items