Combining Region-Guided Attention and Attribute Prediction for Thangka Image Captioning Method
To enhance the understanding of the core regions in Thangka images and improve the richness of generated content during decoding, we propose a Thangka image captioning method based on Region-Guided Feature Enhancement and Attribute Prediction (RGFEAP). The image feature enhancement encoder, guided b...
Saved in:
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2025-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/10833628/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | To enhance the understanding of the core regions in Thangka images and improve the richness of generated content during decoding, we propose a Thangka image captioning method based on Region-Guided Feature Enhancement and Attribute Prediction (RGFEAP). The image feature enhancement encoder, guided by regions, introduces a region-guided module with a distance-weighted strategy to enhance the feature representation of the central sacred elements in Thangka images. Additionally, we designed a Thangka feature enhancement encoder to further refine the regional feature vectors, which are then fused with global features extracted by CLIP through multi-scale convolutional fusion, injecting richer object-related information into the Thangka image features.Furthermore, to enhance the detailed representation capability for generating long-sequence captions of Thangka images, we designed an attribute predictor. This predictor leverages feature maps from four different convolutional blocks within the region-guided module to incorporate more detailed information into the model. Experimental results on the Thangka dataset demonstrate that RGFEAP achieves significant improvements compared to the baseline model ClipCap, with BLEU-1, BLEU-4, CIDEr, and METEOR scores increasing by 14.0%, 17.7%, 185.7%, and 11.5%, respectively. On the COCO dataset in the natural domain, RGFEAP achieves performance comparable to other state-of-the-art models, showcasing its strong adaptability. |
---|---|
ISSN: | 2169-3536 |