Learning with semantic ambiguity for unbiased scene graph generation
Scene graph generation (SGG) aims to identify and extract objects from images and elucidate their interrelations. This task faces two primary challenges. Firstly, the long-tail distribution of relation categories causes SGG models to favor high-frequency relations, such as “on” and “in”. Secondly, s...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
PeerJ Inc.
2025-01-01
|
Series: | PeerJ Computer Science |
Subjects: | |
Online Access: | https://peerj.com/articles/cs-2639.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832586395150974976 |
---|---|
author | Shanjin Zhong Yang Cao Qiaosen Chen Jie Gong |
author_facet | Shanjin Zhong Yang Cao Qiaosen Chen Jie Gong |
author_sort | Shanjin Zhong |
collection | DOAJ |
description | Scene graph generation (SGG) aims to identify and extract objects from images and elucidate their interrelations. This task faces two primary challenges. Firstly, the long-tail distribution of relation categories causes SGG models to favor high-frequency relations, such as “on” and “in”. Secondly, some subject-object pairs may have multiple reasonable relations, which often possess a certain degree of semantic similarity. However, the use of one-hot ground-truth relation labels does not effectively represent the semantic similarities and distinctions among relations. In response to these challenges, we propose a model-agnostic method named Mixup and Balanced Relation Learning (MBRL). This method assigns soft labels to samples exhibiting semantic ambiguities and optimizes model training by adjusting the loss weights for fine-grained and low-frequency relation samples. Its model-agnostic design facilitates seamless integration with diverse SGG models, enhancing their performance across various relation categories. Our approach is evaluated on widely-used datasets, including Visual Genome and Generalized Question Answering, both with over 100,000 images, providing rich visual contexts for scene graph model evaluation. Experimental results show that our method outperforms state-of-the-art approaches on multiple scene graph generation tasks, demonstrating significant improvements in both relation prediction accuracy and the handling of imbalanced data distributions. |
format | Article |
id | doaj-art-27c45a014b434abd99b1172fdf4f1301 |
institution | Kabale University |
issn | 2376-5992 |
language | English |
publishDate | 2025-01-01 |
publisher | PeerJ Inc. |
record_format | Article |
series | PeerJ Computer Science |
spelling | doaj-art-27c45a014b434abd99b1172fdf4f13012025-01-25T15:05:12ZengPeerJ Inc.PeerJ Computer Science2376-59922025-01-0111e263910.7717/peerj-cs.2639Learning with semantic ambiguity for unbiased scene graph generationShanjin Zhong0Yang Cao1Qiaosen Chen2Jie Gong3School of Artificial Intelligence, South China Normal University, Foshan, Guangdong, ChinaSchool of Computer Science, South China Normal University, Guangzhou, Guangdong, ChinaSchool of Computer Science, South China Normal University, Guangzhou, Guangdong, ChinaSchool of Computer Science, South China Normal University, Guangzhou, Guangdong, ChinaScene graph generation (SGG) aims to identify and extract objects from images and elucidate their interrelations. This task faces two primary challenges. Firstly, the long-tail distribution of relation categories causes SGG models to favor high-frequency relations, such as “on” and “in”. Secondly, some subject-object pairs may have multiple reasonable relations, which often possess a certain degree of semantic similarity. However, the use of one-hot ground-truth relation labels does not effectively represent the semantic similarities and distinctions among relations. In response to these challenges, we propose a model-agnostic method named Mixup and Balanced Relation Learning (MBRL). This method assigns soft labels to samples exhibiting semantic ambiguities and optimizes model training by adjusting the loss weights for fine-grained and low-frequency relation samples. Its model-agnostic design facilitates seamless integration with diverse SGG models, enhancing their performance across various relation categories. Our approach is evaluated on widely-used datasets, including Visual Genome and Generalized Question Answering, both with over 100,000 images, providing rich visual contexts for scene graph model evaluation. Experimental results show that our method outperforms state-of-the-art approaches on multiple scene graph generation tasks, demonstrating significant improvements in both relation prediction accuracy and the handling of imbalanced data distributions.https://peerj.com/articles/cs-2639.pdfScene graph generationLong-tail distributionSemantic ambiguitySoft label |
spellingShingle | Shanjin Zhong Yang Cao Qiaosen Chen Jie Gong Learning with semantic ambiguity for unbiased scene graph generation PeerJ Computer Science Scene graph generation Long-tail distribution Semantic ambiguity Soft label |
title | Learning with semantic ambiguity for unbiased scene graph generation |
title_full | Learning with semantic ambiguity for unbiased scene graph generation |
title_fullStr | Learning with semantic ambiguity for unbiased scene graph generation |
title_full_unstemmed | Learning with semantic ambiguity for unbiased scene graph generation |
title_short | Learning with semantic ambiguity for unbiased scene graph generation |
title_sort | learning with semantic ambiguity for unbiased scene graph generation |
topic | Scene graph generation Long-tail distribution Semantic ambiguity Soft label |
url | https://peerj.com/articles/cs-2639.pdf |
work_keys_str_mv | AT shanjinzhong learningwithsemanticambiguityforunbiasedscenegraphgeneration AT yangcao learningwithsemanticambiguityforunbiasedscenegraphgeneration AT qiaosenchen learningwithsemanticambiguityforunbiasedscenegraphgeneration AT jiegong learningwithsemanticambiguityforunbiasedscenegraphgeneration |