Semantic-enhanced panoptic scene graph generation through hybrid and axial attentions

Abstract The generation of panoramic scene graphs represents a cutting-edge challenge in image scene understanding, necessitating sophisticated predictions of both intra-object relationships and interactions between objects and their backgrounds. This complexity tests the limits of current predictiv...

Full description

Saved in:
Bibliographic Details
Main Authors: Xinhe Kuang, Yuxin Che, Huiyan Han, Yimin Liu
Format: Article
Language:English
Published: Springer 2024-12-01
Series:Complex & Intelligent Systems
Subjects:
Online Access:https://doi.org/10.1007/s40747-024-01746-z
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832571184294658048
author Xinhe Kuang
Yuxin Che
Huiyan Han
Yimin Liu
author_facet Xinhe Kuang
Yuxin Che
Huiyan Han
Yimin Liu
author_sort Xinhe Kuang
collection DOAJ
description Abstract The generation of panoramic scene graphs represents a cutting-edge challenge in image scene understanding, necessitating sophisticated predictions of both intra-object relationships and interactions between objects and their backgrounds. This complexity tests the limits of current predictive models' ability to discern nuanced relationships within images. Conventional approaches often fail to effectively combine visual and semantic data, leading to predictions that are semantically impoverished. To address these issues, we propose a novel method of semantic-enhanced panoramic scene graph generation through hybrid and axial attentions (PSGAtten). Specifically, a series of hybrid attention networks are stacked within both the object context encoding and relationship context encoding modules, enhancing the refinement and fusion of visual and semantic information. Within the hybrid attention networks, self-attention mechanisms facilitate feature refinement within modalities, while cross-attention mechanisms promote feature fusion across modalities. The axial attention model is further applied to enhance the integration ability of global information. Experimental validation on the PSG dataset confirms that our approach not only surpasses existing methods in generating detailed panoramic scene graphs but also significantly improves recall rates, thereby enhancing the ability to predict relationships in scene graph generation.
format Article
id doaj-art-96e1323cadbc466db32e20f788a19502
institution Kabale University
issn 2199-4536
2198-6053
language English
publishDate 2024-12-01
publisher Springer
record_format Article
series Complex & Intelligent Systems
spelling doaj-art-96e1323cadbc466db32e20f788a195022025-02-02T12:49:46ZengSpringerComplex & Intelligent Systems2199-45362198-60532024-12-0111111510.1007/s40747-024-01746-zSemantic-enhanced panoptic scene graph generation through hybrid and axial attentionsXinhe Kuang0Yuxin Che1Huiyan Han2Yimin Liu3Sydney Smart Technology College, Northeastern UniversitySchool of Computer Science and Technology, North University of ChinaSchool of Computer Science and Technology, North University of ChinaShanxi Center of Technology Innovation for Digital and Intelligent Integration of Cultural and Tourism InformationAbstract The generation of panoramic scene graphs represents a cutting-edge challenge in image scene understanding, necessitating sophisticated predictions of both intra-object relationships and interactions between objects and their backgrounds. This complexity tests the limits of current predictive models' ability to discern nuanced relationships within images. Conventional approaches often fail to effectively combine visual and semantic data, leading to predictions that are semantically impoverished. To address these issues, we propose a novel method of semantic-enhanced panoramic scene graph generation through hybrid and axial attentions (PSGAtten). Specifically, a series of hybrid attention networks are stacked within both the object context encoding and relationship context encoding modules, enhancing the refinement and fusion of visual and semantic information. Within the hybrid attention networks, self-attention mechanisms facilitate feature refinement within modalities, while cross-attention mechanisms promote feature fusion across modalities. The axial attention model is further applied to enhance the integration ability of global information. Experimental validation on the PSG dataset confirms that our approach not only surpasses existing methods in generating detailed panoramic scene graphs but also significantly improves recall rates, thereby enhancing the ability to predict relationships in scene graph generation.https://doi.org/10.1007/s40747-024-01746-zScene graph generationPanoptic scene graphSemantic informationAttention mechanism
spellingShingle Xinhe Kuang
Yuxin Che
Huiyan Han
Yimin Liu
Semantic-enhanced panoptic scene graph generation through hybrid and axial attentions
Complex & Intelligent Systems
Scene graph generation
Panoptic scene graph
Semantic information
Attention mechanism
title Semantic-enhanced panoptic scene graph generation through hybrid and axial attentions
title_full Semantic-enhanced panoptic scene graph generation through hybrid and axial attentions
title_fullStr Semantic-enhanced panoptic scene graph generation through hybrid and axial attentions
title_full_unstemmed Semantic-enhanced panoptic scene graph generation through hybrid and axial attentions
title_short Semantic-enhanced panoptic scene graph generation through hybrid and axial attentions
title_sort semantic enhanced panoptic scene graph generation through hybrid and axial attentions
topic Scene graph generation
Panoptic scene graph
Semantic information
Attention mechanism
url https://doi.org/10.1007/s40747-024-01746-z
work_keys_str_mv AT xinhekuang semanticenhancedpanopticscenegraphgenerationthroughhybridandaxialattentions
AT yuxinche semanticenhancedpanopticscenegraphgenerationthroughhybridandaxialattentions
AT huiyanhan semanticenhancedpanopticscenegraphgenerationthroughhybridandaxialattentions
AT yiminliu semanticenhancedpanopticscenegraphgenerationthroughhybridandaxialattentions