Semantic-Guided Selective Representation for Image Captioning
Grid-based features have been proven to be as effective as region-based features in multi-modal tasks such as visual question answering. However, its application to image captioning encounters two main issues, namely, noisy features and fragmented semantics. In this paper, we propose a novel feature...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2023-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/10041895/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849328664697634816 |
|---|---|
| author | Yinan Li Yiwei Ma Yiyi Zhou Xiao Yu |
| author_facet | Yinan Li Yiwei Ma Yiyi Zhou Xiao Yu |
| author_sort | Yinan Li |
| collection | DOAJ |
| description | Grid-based features have been proven to be as effective as region-based features in multi-modal tasks such as visual question answering. However, its application to image captioning encounters two main issues, namely, noisy features and fragmented semantics. In this paper, we propose a novel feature selection scheme, with a Relation-Aware Selection (RAS) and a Fine-grained Semantic Guidance (FSG) learning strategy. Based on the grid-wise interactions, RAS can enhance the salient visual regions and channels, and suppress the less important ones. In addition, this selection process is guided by FSG, which uses fine-grained semantic knowledge to supervise the selection process. Experimental results on the MS COCO show the proposed RAS-FSG scheme achieves state-of-the-art performance on both the off-line and on-line testing, i.e., 134.3 CIDEr for the off-line testing and 135.4 for the on-line testing of MSCOCO. Extensive ablation studies and visualizations also validate the effectiveness of our scheme. |
| format | Article |
| id | doaj-art-5949735630ea407abd9c7dd21f93a79c |
| institution | Kabale University |
| issn | 2169-3536 |
| language | English |
| publishDate | 2023-01-01 |
| publisher | IEEE |
| record_format | Article |
| series | IEEE Access |
| spelling | doaj-art-5949735630ea407abd9c7dd21f93a79c2025-08-20T03:47:32ZengIEEEIEEE Access2169-35362023-01-0111145001451010.1109/ACCESS.2023.324395210041895Semantic-Guided Selective Representation for Image CaptioningYinan Li0https://orcid.org/0000-0001-9620-8241Yiwei Ma1https://orcid.org/0000-0002-8744-3423Yiyi Zhou2https://orcid.org/0000-0002-5110-4526Xiao Yu3https://orcid.org/0000-0002-0314-9295Media Analytics and Computing Laboratory, Department of Artificial Intelligence, School of Informatics, Xiamen University, Xiamen, ChinaMedia Analytics and Computing Laboratory, Department of Artificial Intelligence, School of Informatics, Xiamen University, Xiamen, ChinaMedia Analytics and Computing Laboratory, Department of Artificial Intelligence, School of Informatics, Xiamen University, Xiamen, ChinaDigital Governance Laboratory, Sichuan Administration Institute, Chengdu, ChinaGrid-based features have been proven to be as effective as region-based features in multi-modal tasks such as visual question answering. However, its application to image captioning encounters two main issues, namely, noisy features and fragmented semantics. In this paper, we propose a novel feature selection scheme, with a Relation-Aware Selection (RAS) and a Fine-grained Semantic Guidance (FSG) learning strategy. Based on the grid-wise interactions, RAS can enhance the salient visual regions and channels, and suppress the less important ones. In addition, this selection process is guided by FSG, which uses fine-grained semantic knowledge to supervise the selection process. Experimental results on the MS COCO show the proposed RAS-FSG scheme achieves state-of-the-art performance on both the off-line and on-line testing, i.e., 134.3 CIDEr for the off-line testing and 135.4 for the on-line testing of MSCOCO. Extensive ablation studies and visualizations also validate the effectiveness of our scheme.https://ieeexplore.ieee.org/document/10041895/Fine-grained semantic guidancerelation-aware selectionimage captioning |
| spellingShingle | Yinan Li Yiwei Ma Yiyi Zhou Xiao Yu Semantic-Guided Selective Representation for Image Captioning IEEE Access Fine-grained semantic guidance relation-aware selection image captioning |
| title | Semantic-Guided Selective Representation for Image Captioning |
| title_full | Semantic-Guided Selective Representation for Image Captioning |
| title_fullStr | Semantic-Guided Selective Representation for Image Captioning |
| title_full_unstemmed | Semantic-Guided Selective Representation for Image Captioning |
| title_short | Semantic-Guided Selective Representation for Image Captioning |
| title_sort | semantic guided selective representation for image captioning |
| topic | Fine-grained semantic guidance relation-aware selection image captioning |
| url | https://ieeexplore.ieee.org/document/10041895/ |
| work_keys_str_mv | AT yinanli semanticguidedselectiverepresentationforimagecaptioning AT yiweima semanticguidedselectiverepresentationforimagecaptioning AT yiyizhou semanticguidedselectiverepresentationforimagecaptioning AT xiaoyu semanticguidedselectiverepresentationforimagecaptioning |