Learning with semantic ambiguity for unbiased scene graph generation

Scene graph generation (SGG) aims to identify and extract objects from images and elucidate their interrelations. This task faces two primary challenges. Firstly, the long-tail distribution of relation categories causes SGG models to favor high-frequency relations, such as “on” and “in”. Secondly, s...

Full description

Saved in:
Bibliographic Details
Main Authors: Shanjin Zhong, Yang Cao, Qiaosen Chen, Jie Gong
Format: Article
Language:English
Published: PeerJ Inc. 2025-01-01
Series:PeerJ Computer Science
Subjects:
Online Access:https://peerj.com/articles/cs-2639.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832586395150974976
author Shanjin Zhong
Yang Cao
Qiaosen Chen
Jie Gong
author_facet Shanjin Zhong
Yang Cao
Qiaosen Chen
Jie Gong
author_sort Shanjin Zhong
collection DOAJ
description Scene graph generation (SGG) aims to identify and extract objects from images and elucidate their interrelations. This task faces two primary challenges. Firstly, the long-tail distribution of relation categories causes SGG models to favor high-frequency relations, such as “on” and “in”. Secondly, some subject-object pairs may have multiple reasonable relations, which often possess a certain degree of semantic similarity. However, the use of one-hot ground-truth relation labels does not effectively represent the semantic similarities and distinctions among relations. In response to these challenges, we propose a model-agnostic method named Mixup and Balanced Relation Learning (MBRL). This method assigns soft labels to samples exhibiting semantic ambiguities and optimizes model training by adjusting the loss weights for fine-grained and low-frequency relation samples. Its model-agnostic design facilitates seamless integration with diverse SGG models, enhancing their performance across various relation categories. Our approach is evaluated on widely-used datasets, including Visual Genome and Generalized Question Answering, both with over 100,000 images, providing rich visual contexts for scene graph model evaluation. Experimental results show that our method outperforms state-of-the-art approaches on multiple scene graph generation tasks, demonstrating significant improvements in both relation prediction accuracy and the handling of imbalanced data distributions.
format Article
id doaj-art-27c45a014b434abd99b1172fdf4f1301
institution Kabale University
issn 2376-5992
language English
publishDate 2025-01-01
publisher PeerJ Inc.
record_format Article
series PeerJ Computer Science
spelling doaj-art-27c45a014b434abd99b1172fdf4f13012025-01-25T15:05:12ZengPeerJ Inc.PeerJ Computer Science2376-59922025-01-0111e263910.7717/peerj-cs.2639Learning with semantic ambiguity for unbiased scene graph generationShanjin Zhong0Yang Cao1Qiaosen Chen2Jie Gong3School of Artificial Intelligence, South China Normal University, Foshan, Guangdong, ChinaSchool of Computer Science, South China Normal University, Guangzhou, Guangdong, ChinaSchool of Computer Science, South China Normal University, Guangzhou, Guangdong, ChinaSchool of Computer Science, South China Normal University, Guangzhou, Guangdong, ChinaScene graph generation (SGG) aims to identify and extract objects from images and elucidate their interrelations. This task faces two primary challenges. Firstly, the long-tail distribution of relation categories causes SGG models to favor high-frequency relations, such as “on” and “in”. Secondly, some subject-object pairs may have multiple reasonable relations, which often possess a certain degree of semantic similarity. However, the use of one-hot ground-truth relation labels does not effectively represent the semantic similarities and distinctions among relations. In response to these challenges, we propose a model-agnostic method named Mixup and Balanced Relation Learning (MBRL). This method assigns soft labels to samples exhibiting semantic ambiguities and optimizes model training by adjusting the loss weights for fine-grained and low-frequency relation samples. Its model-agnostic design facilitates seamless integration with diverse SGG models, enhancing their performance across various relation categories. Our approach is evaluated on widely-used datasets, including Visual Genome and Generalized Question Answering, both with over 100,000 images, providing rich visual contexts for scene graph model evaluation. Experimental results show that our method outperforms state-of-the-art approaches on multiple scene graph generation tasks, demonstrating significant improvements in both relation prediction accuracy and the handling of imbalanced data distributions.https://peerj.com/articles/cs-2639.pdfScene graph generationLong-tail distributionSemantic ambiguitySoft label
spellingShingle Shanjin Zhong
Yang Cao
Qiaosen Chen
Jie Gong
Learning with semantic ambiguity for unbiased scene graph generation
PeerJ Computer Science
Scene graph generation
Long-tail distribution
Semantic ambiguity
Soft label
title Learning with semantic ambiguity for unbiased scene graph generation
title_full Learning with semantic ambiguity for unbiased scene graph generation
title_fullStr Learning with semantic ambiguity for unbiased scene graph generation
title_full_unstemmed Learning with semantic ambiguity for unbiased scene graph generation
title_short Learning with semantic ambiguity for unbiased scene graph generation
title_sort learning with semantic ambiguity for unbiased scene graph generation
topic Scene graph generation
Long-tail distribution
Semantic ambiguity
Soft label
url https://peerj.com/articles/cs-2639.pdf
work_keys_str_mv AT shanjinzhong learningwithsemanticambiguityforunbiasedscenegraphgeneration
AT yangcao learningwithsemanticambiguityforunbiasedscenegraphgeneration
AT qiaosenchen learningwithsemanticambiguityforunbiasedscenegraphgeneration
AT jiegong learningwithsemanticambiguityforunbiasedscenegraphgeneration