Semantic-enhanced panoptic scene graph generation through hybrid and axial attentions
Abstract The generation of panoramic scene graphs represents a cutting-edge challenge in image scene understanding, necessitating sophisticated predictions of both intra-object relationships and interactions between objects and their backgrounds. This complexity tests the limits of current predictiv...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Springer
2024-12-01
|
Series: | Complex & Intelligent Systems |
Subjects: | |
Online Access: | https://doi.org/10.1007/s40747-024-01746-z |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832571184294658048 |
---|---|
author | Xinhe Kuang Yuxin Che Huiyan Han Yimin Liu |
author_facet | Xinhe Kuang Yuxin Che Huiyan Han Yimin Liu |
author_sort | Xinhe Kuang |
collection | DOAJ |
description | Abstract The generation of panoramic scene graphs represents a cutting-edge challenge in image scene understanding, necessitating sophisticated predictions of both intra-object relationships and interactions between objects and their backgrounds. This complexity tests the limits of current predictive models' ability to discern nuanced relationships within images. Conventional approaches often fail to effectively combine visual and semantic data, leading to predictions that are semantically impoverished. To address these issues, we propose a novel method of semantic-enhanced panoramic scene graph generation through hybrid and axial attentions (PSGAtten). Specifically, a series of hybrid attention networks are stacked within both the object context encoding and relationship context encoding modules, enhancing the refinement and fusion of visual and semantic information. Within the hybrid attention networks, self-attention mechanisms facilitate feature refinement within modalities, while cross-attention mechanisms promote feature fusion across modalities. The axial attention model is further applied to enhance the integration ability of global information. Experimental validation on the PSG dataset confirms that our approach not only surpasses existing methods in generating detailed panoramic scene graphs but also significantly improves recall rates, thereby enhancing the ability to predict relationships in scene graph generation. |
format | Article |
id | doaj-art-96e1323cadbc466db32e20f788a19502 |
institution | Kabale University |
issn | 2199-4536 2198-6053 |
language | English |
publishDate | 2024-12-01 |
publisher | Springer |
record_format | Article |
series | Complex & Intelligent Systems |
spelling | doaj-art-96e1323cadbc466db32e20f788a195022025-02-02T12:49:46ZengSpringerComplex & Intelligent Systems2199-45362198-60532024-12-0111111510.1007/s40747-024-01746-zSemantic-enhanced panoptic scene graph generation through hybrid and axial attentionsXinhe Kuang0Yuxin Che1Huiyan Han2Yimin Liu3Sydney Smart Technology College, Northeastern UniversitySchool of Computer Science and Technology, North University of ChinaSchool of Computer Science and Technology, North University of ChinaShanxi Center of Technology Innovation for Digital and Intelligent Integration of Cultural and Tourism InformationAbstract The generation of panoramic scene graphs represents a cutting-edge challenge in image scene understanding, necessitating sophisticated predictions of both intra-object relationships and interactions between objects and their backgrounds. This complexity tests the limits of current predictive models' ability to discern nuanced relationships within images. Conventional approaches often fail to effectively combine visual and semantic data, leading to predictions that are semantically impoverished. To address these issues, we propose a novel method of semantic-enhanced panoramic scene graph generation through hybrid and axial attentions (PSGAtten). Specifically, a series of hybrid attention networks are stacked within both the object context encoding and relationship context encoding modules, enhancing the refinement and fusion of visual and semantic information. Within the hybrid attention networks, self-attention mechanisms facilitate feature refinement within modalities, while cross-attention mechanisms promote feature fusion across modalities. The axial attention model is further applied to enhance the integration ability of global information. Experimental validation on the PSG dataset confirms that our approach not only surpasses existing methods in generating detailed panoramic scene graphs but also significantly improves recall rates, thereby enhancing the ability to predict relationships in scene graph generation.https://doi.org/10.1007/s40747-024-01746-zScene graph generationPanoptic scene graphSemantic informationAttention mechanism |
spellingShingle | Xinhe Kuang Yuxin Che Huiyan Han Yimin Liu Semantic-enhanced panoptic scene graph generation through hybrid and axial attentions Complex & Intelligent Systems Scene graph generation Panoptic scene graph Semantic information Attention mechanism |
title | Semantic-enhanced panoptic scene graph generation through hybrid and axial attentions |
title_full | Semantic-enhanced panoptic scene graph generation through hybrid and axial attentions |
title_fullStr | Semantic-enhanced panoptic scene graph generation through hybrid and axial attentions |
title_full_unstemmed | Semantic-enhanced panoptic scene graph generation through hybrid and axial attentions |
title_short | Semantic-enhanced panoptic scene graph generation through hybrid and axial attentions |
title_sort | semantic enhanced panoptic scene graph generation through hybrid and axial attentions |
topic | Scene graph generation Panoptic scene graph Semantic information Attention mechanism |
url | https://doi.org/10.1007/s40747-024-01746-z |
work_keys_str_mv | AT xinhekuang semanticenhancedpanopticscenegraphgenerationthroughhybridandaxialattentions AT yuxinche semanticenhancedpanopticscenegraphgenerationthroughhybridandaxialattentions AT huiyanhan semanticenhancedpanopticscenegraphgenerationthroughhybridandaxialattentions AT yiminliu semanticenhancedpanopticscenegraphgenerationthroughhybridandaxialattentions |