Attention Mechanism-Based Cognition-Level Scene Understanding

Given a question–image input, a visual commonsense reasoning (VCR) model predicts an answer with a corresponding rationale, which requires inference abilities based on real-world knowledge. The VCR task, which calls for exploiting multi-source information as well as learning different levels of unde...

Full description

Saved in:
Bibliographic Details
Main Authors: Xuejiao Tang, Wenbin Zhang
Format: Article
Language:English
Published: MDPI AG 2025-03-01
Series:Information
Subjects:
Online Access:https://www.mdpi.com/2078-2489/16/3/203
Tags: Add Tag
No Tags, Be the first to tag this record!