Semantic-enhanced panoptic scene graph generation through hybrid and axial attentions

Abstract The generation of panoramic scene graphs represents a cutting-edge challenge in image scene understanding, necessitating sophisticated predictions of both intra-object relationships and interactions between objects and their backgrounds. This complexity tests the limits of current predictiv...

Full description

Saved in:

Bibliographic Details
Main Authors:	Xinhe Kuang, Yuxin Che, Huiyan Han, Yimin Liu
Format:	Article
Language:	English
Published:	Springer 2024-12-01
Series:	Complex & Intelligent Systems
Subjects:	Scene graph generation Panoptic scene graph Semantic information Attention mechanism
Online Access:	https://doi.org/10.1007/s40747-024-01746-z
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832571184294658048
author	Xinhe Kuang Yuxin Che Huiyan Han Yimin Liu
author_facet	Xinhe Kuang Yuxin Che Huiyan Han Yimin Liu
author_sort	Xinhe Kuang
collection	DOAJ
description	Abstract The generation of panoramic scene graphs represents a cutting-edge challenge in image scene understanding, necessitating sophisticated predictions of both intra-object relationships and interactions between objects and their backgrounds. This complexity tests the limits of current predictive models' ability to discern nuanced relationships within images. Conventional approaches often fail to effectively combine visual and semantic data, leading to predictions that are semantically impoverished. To address these issues, we propose a novel method of semantic-enhanced panoramic scene graph generation through hybrid and axial attentions (PSGAtten). Specifically, a series of hybrid attention networks are stacked within both the object context encoding and relationship context encoding modules, enhancing the refinement and fusion of visual and semantic information. Within the hybrid attention networks, self-attention mechanisms facilitate feature refinement within modalities, while cross-attention mechanisms promote feature fusion across modalities. The axial attention model is further applied to enhance the integration ability of global information. Experimental validation on the PSG dataset confirms that our approach not only surpasses existing methods in generating detailed panoramic scene graphs but also significantly improves recall rates, thereby enhancing the ability to predict relationships in scene graph generation.
format	Article
id	doaj-art-96e1323cadbc466db32e20f788a19502
institution	Kabale University
issn	2199-4536 2198-6053
language	English
publishDate	2024-12-01
publisher	Springer
record_format	Article
series	Complex & Intelligent Systems
spelling	doaj-art-96e1323cadbc466db32e20f788a195022025-02-02T12:49:46ZengSpringerComplex & Intelligent Systems2199-45362198-60532024-12-0111111510.1007/s40747-024-01746-zSemantic-enhanced panoptic scene graph generation through hybrid and axial attentionsXinhe Kuang0Yuxin Che1Huiyan Han2Yimin Liu3Sydney Smart Technology College, Northeastern UniversitySchool of Computer Science and Technology, North University of ChinaSchool of Computer Science and Technology, North University of ChinaShanxi Center of Technology Innovation for Digital and Intelligent Integration of Cultural and Tourism InformationAbstract The generation of panoramic scene graphs represents a cutting-edge challenge in image scene understanding, necessitating sophisticated predictions of both intra-object relationships and interactions between objects and their backgrounds. This complexity tests the limits of current predictive models' ability to discern nuanced relationships within images. Conventional approaches often fail to effectively combine visual and semantic data, leading to predictions that are semantically impoverished. To address these issues, we propose a novel method of semantic-enhanced panoramic scene graph generation through hybrid and axial attentions (PSGAtten). Specifically, a series of hybrid attention networks are stacked within both the object context encoding and relationship context encoding modules, enhancing the refinement and fusion of visual and semantic information. Within the hybrid attention networks, self-attention mechanisms facilitate feature refinement within modalities, while cross-attention mechanisms promote feature fusion across modalities. The axial attention model is further applied to enhance the integration ability of global information. Experimental validation on the PSG dataset confirms that our approach not only surpasses existing methods in generating detailed panoramic scene graphs but also significantly improves recall rates, thereby enhancing the ability to predict relationships in scene graph generation.https://doi.org/10.1007/s40747-024-01746-zScene graph generationPanoptic scene graphSemantic informationAttention mechanism
spellingShingle	Xinhe Kuang Yuxin Che Huiyan Han Yimin Liu Semantic-enhanced panoptic scene graph generation through hybrid and axial attentions Complex & Intelligent Systems Scene graph generation Panoptic scene graph Semantic information Attention mechanism
title	Semantic-enhanced panoptic scene graph generation through hybrid and axial attentions
title_full	Semantic-enhanced panoptic scene graph generation through hybrid and axial attentions
title_fullStr	Semantic-enhanced panoptic scene graph generation through hybrid and axial attentions
title_full_unstemmed	Semantic-enhanced panoptic scene graph generation through hybrid and axial attentions
title_short	Semantic-enhanced panoptic scene graph generation through hybrid and axial attentions
title_sort	semantic enhanced panoptic scene graph generation through hybrid and axial attentions
topic	Scene graph generation Panoptic scene graph Semantic information Attention mechanism
url	https://doi.org/10.1007/s40747-024-01746-z
work_keys_str_mv	AT xinhekuang semanticenhancedpanopticscenegraphgenerationthroughhybridandaxialattentions AT yuxinche semanticenhancedpanopticscenegraphgenerationthroughhybridandaxialattentions AT huiyanhan semanticenhancedpanopticscenegraphgenerationthroughhybridandaxialattentions AT yiminliu semanticenhancedpanopticscenegraphgenerationthroughhybridandaxialattentions

Semantic-enhanced panoptic scene graph generation through hybrid and axial attentions

Similar Items