Learning with semantic ambiguity for unbiased scene graph generation

Scene graph generation (SGG) aims to identify and extract objects from images and elucidate their interrelations. This task faces two primary challenges. Firstly, the long-tail distribution of relation categories causes SGG models to favor high-frequency relations, such as “on” and “in”. Secondly, s...

Full description

Saved in:

Bibliographic Details
Main Authors:	Shanjin Zhong, Yang Cao, Qiaosen Chen, Jie Gong
Format:	Article
Language:	English
Published:	PeerJ Inc. 2025-01-01
Series:	PeerJ Computer Science
Subjects:	Scene graph generation Long-tail distribution Semantic ambiguity Soft label
Online Access:	https://peerj.com/articles/cs-2639.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832586395150974976
author	Shanjin Zhong Yang Cao Qiaosen Chen Jie Gong
author_facet	Shanjin Zhong Yang Cao Qiaosen Chen Jie Gong
author_sort	Shanjin Zhong
collection	DOAJ
description	Scene graph generation (SGG) aims to identify and extract objects from images and elucidate their interrelations. This task faces two primary challenges. Firstly, the long-tail distribution of relation categories causes SGG models to favor high-frequency relations, such as “on” and “in”. Secondly, some subject-object pairs may have multiple reasonable relations, which often possess a certain degree of semantic similarity. However, the use of one-hot ground-truth relation labels does not effectively represent the semantic similarities and distinctions among relations. In response to these challenges, we propose a model-agnostic method named Mixup and Balanced Relation Learning (MBRL). This method assigns soft labels to samples exhibiting semantic ambiguities and optimizes model training by adjusting the loss weights for fine-grained and low-frequency relation samples. Its model-agnostic design facilitates seamless integration with diverse SGG models, enhancing their performance across various relation categories. Our approach is evaluated on widely-used datasets, including Visual Genome and Generalized Question Answering, both with over 100,000 images, providing rich visual contexts for scene graph model evaluation. Experimental results show that our method outperforms state-of-the-art approaches on multiple scene graph generation tasks, demonstrating significant improvements in both relation prediction accuracy and the handling of imbalanced data distributions.
format	Article
id	doaj-art-27c45a014b434abd99b1172fdf4f1301
institution	Kabale University
issn	2376-5992
language	English
publishDate	2025-01-01
publisher	PeerJ Inc.
record_format	Article
series	PeerJ Computer Science
spelling	doaj-art-27c45a014b434abd99b1172fdf4f13012025-01-25T15:05:12ZengPeerJ Inc.PeerJ Computer Science2376-59922025-01-0111e263910.7717/peerj-cs.2639Learning with semantic ambiguity for unbiased scene graph generationShanjin Zhong0Yang Cao1Qiaosen Chen2Jie Gong3School of Artificial Intelligence, South China Normal University, Foshan, Guangdong, ChinaSchool of Computer Science, South China Normal University, Guangzhou, Guangdong, ChinaSchool of Computer Science, South China Normal University, Guangzhou, Guangdong, ChinaSchool of Computer Science, South China Normal University, Guangzhou, Guangdong, ChinaScene graph generation (SGG) aims to identify and extract objects from images and elucidate their interrelations. This task faces two primary challenges. Firstly, the long-tail distribution of relation categories causes SGG models to favor high-frequency relations, such as “on” and “in”. Secondly, some subject-object pairs may have multiple reasonable relations, which often possess a certain degree of semantic similarity. However, the use of one-hot ground-truth relation labels does not effectively represent the semantic similarities and distinctions among relations. In response to these challenges, we propose a model-agnostic method named Mixup and Balanced Relation Learning (MBRL). This method assigns soft labels to samples exhibiting semantic ambiguities and optimizes model training by adjusting the loss weights for fine-grained and low-frequency relation samples. Its model-agnostic design facilitates seamless integration with diverse SGG models, enhancing their performance across various relation categories. Our approach is evaluated on widely-used datasets, including Visual Genome and Generalized Question Answering, both with over 100,000 images, providing rich visual contexts for scene graph model evaluation. Experimental results show that our method outperforms state-of-the-art approaches on multiple scene graph generation tasks, demonstrating significant improvements in both relation prediction accuracy and the handling of imbalanced data distributions.https://peerj.com/articles/cs-2639.pdfScene graph generationLong-tail distributionSemantic ambiguitySoft label
spellingShingle	Shanjin Zhong Yang Cao Qiaosen Chen Jie Gong Learning with semantic ambiguity for unbiased scene graph generation PeerJ Computer Science Scene graph generation Long-tail distribution Semantic ambiguity Soft label
title	Learning with semantic ambiguity for unbiased scene graph generation
title_full	Learning with semantic ambiguity for unbiased scene graph generation
title_fullStr	Learning with semantic ambiguity for unbiased scene graph generation
title_full_unstemmed	Learning with semantic ambiguity for unbiased scene graph generation
title_short	Learning with semantic ambiguity for unbiased scene graph generation
title_sort	learning with semantic ambiguity for unbiased scene graph generation
topic	Scene graph generation Long-tail distribution Semantic ambiguity Soft label
url	https://peerj.com/articles/cs-2639.pdf
work_keys_str_mv	AT shanjinzhong learningwithsemanticambiguityforunbiasedscenegraphgeneration AT yangcao learningwithsemanticambiguityforunbiasedscenegraphgeneration AT qiaosenchen learningwithsemanticambiguityforunbiasedscenegraphgeneration AT jiegong learningwithsemanticambiguityforunbiasedscenegraphgeneration

Learning with semantic ambiguity for unbiased scene graph generation

Similar Items