Attention Mechanism-Based Cognition-Level Scene Understanding

Given a question–image input, a visual commonsense reasoning (VCR) model predicts an answer with a corresponding rationale, which requires inference abilities based on real-world knowledge. The VCR task, which calls for exploiting multi-source information as well as learning different levels of unde...

Full description

Saved in:

Bibliographic Details
Main Authors:	Xuejiao Tang, Wenbin Zhang
Format:	Article
Language:	English
Published:	MDPI AG 2025-03-01
Series:	Information
Subjects:	visual commonsense reasoning visual understanding
Online Access:	https://www.mdpi.com/2078-2489/16/3/203
Tags:	Add Tag No Tags, Be the first to tag this record!

Be the first to leave a comment!

Attention Mechanism-Based Cognition-Level Scene Understanding

Similar Items