Semantic-Guided Selective Representation for Image Captioning

Grid-based features have been proven to be as effective as region-based features in multi-modal tasks such as visual question answering. However, its application to image captioning encounters two main issues, namely, noisy features and fragmented semantics. In this paper, we propose a novel feature...

Full description

Saved in:
Bibliographic Details
Main Authors: Yinan Li, Yiwei Ma, Yiyi Zhou, Xiao Yu
Format: Article
Language:English
Published: IEEE 2023-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10041895/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849328664697634816
author Yinan Li
Yiwei Ma
Yiyi Zhou
Xiao Yu
author_facet Yinan Li
Yiwei Ma
Yiyi Zhou
Xiao Yu
author_sort Yinan Li
collection DOAJ
description Grid-based features have been proven to be as effective as region-based features in multi-modal tasks such as visual question answering. However, its application to image captioning encounters two main issues, namely, noisy features and fragmented semantics. In this paper, we propose a novel feature selection scheme, with a Relation-Aware Selection (RAS) and a Fine-grained Semantic Guidance (FSG) learning strategy. Based on the grid-wise interactions, RAS can enhance the salient visual regions and channels, and suppress the less important ones. In addition, this selection process is guided by FSG, which uses fine-grained semantic knowledge to supervise the selection process. Experimental results on the MS COCO show the proposed RAS-FSG scheme achieves state-of-the-art performance on both the off-line and on-line testing, i.e., 134.3 CIDEr for the off-line testing and 135.4 for the on-line testing of MSCOCO. Extensive ablation studies and visualizations also validate the effectiveness of our scheme.
format Article
id doaj-art-5949735630ea407abd9c7dd21f93a79c
institution Kabale University
issn 2169-3536
language English
publishDate 2023-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-5949735630ea407abd9c7dd21f93a79c2025-08-20T03:47:32ZengIEEEIEEE Access2169-35362023-01-0111145001451010.1109/ACCESS.2023.324395210041895Semantic-Guided Selective Representation for Image CaptioningYinan Li0https://orcid.org/0000-0001-9620-8241Yiwei Ma1https://orcid.org/0000-0002-8744-3423Yiyi Zhou2https://orcid.org/0000-0002-5110-4526Xiao Yu3https://orcid.org/0000-0002-0314-9295Media Analytics and Computing Laboratory, Department of Artificial Intelligence, School of Informatics, Xiamen University, Xiamen, ChinaMedia Analytics and Computing Laboratory, Department of Artificial Intelligence, School of Informatics, Xiamen University, Xiamen, ChinaMedia Analytics and Computing Laboratory, Department of Artificial Intelligence, School of Informatics, Xiamen University, Xiamen, ChinaDigital Governance Laboratory, Sichuan Administration Institute, Chengdu, ChinaGrid-based features have been proven to be as effective as region-based features in multi-modal tasks such as visual question answering. However, its application to image captioning encounters two main issues, namely, noisy features and fragmented semantics. In this paper, we propose a novel feature selection scheme, with a Relation-Aware Selection (RAS) and a Fine-grained Semantic Guidance (FSG) learning strategy. Based on the grid-wise interactions, RAS can enhance the salient visual regions and channels, and suppress the less important ones. In addition, this selection process is guided by FSG, which uses fine-grained semantic knowledge to supervise the selection process. Experimental results on the MS COCO show the proposed RAS-FSG scheme achieves state-of-the-art performance on both the off-line and on-line testing, i.e., 134.3 CIDEr for the off-line testing and 135.4 for the on-line testing of MSCOCO. Extensive ablation studies and visualizations also validate the effectiveness of our scheme.https://ieeexplore.ieee.org/document/10041895/Fine-grained semantic guidancerelation-aware selectionimage captioning
spellingShingle Yinan Li
Yiwei Ma
Yiyi Zhou
Xiao Yu
Semantic-Guided Selective Representation for Image Captioning
IEEE Access
Fine-grained semantic guidance
relation-aware selection
image captioning
title Semantic-Guided Selective Representation for Image Captioning
title_full Semantic-Guided Selective Representation for Image Captioning
title_fullStr Semantic-Guided Selective Representation for Image Captioning
title_full_unstemmed Semantic-Guided Selective Representation for Image Captioning
title_short Semantic-Guided Selective Representation for Image Captioning
title_sort semantic guided selective representation for image captioning
topic Fine-grained semantic guidance
relation-aware selection
image captioning
url https://ieeexplore.ieee.org/document/10041895/
work_keys_str_mv AT yinanli semanticguidedselectiverepresentationforimagecaptioning
AT yiweima semanticguidedselectiverepresentationforimagecaptioning
AT yiyizhou semanticguidedselectiverepresentationforimagecaptioning
AT xiaoyu semanticguidedselectiverepresentationforimagecaptioning