Visual Question Answering in Robotic Surgery: A Comprehensive Review

Visual Question Answering (VQA) in robotic surgery is rapidly becoming a pivotal technology in medical AI, addressing the complex challenge of interpreting multimodal surgical data to support real-time decision-making. This comprehensive review synthesizes key advancements in Surgical VQA, highlight...

Full description

Saved in:

Bibliographic Details
Main Authors:	Di Ding, Tianliang Yao, Rong Luo, Xusen Sun
Format:	Article
Language:	English
Published:	IEEE 2025-01-01
Series:	IEEE Access
Subjects:	Surgical visual question answering multimodal learning robotic surgery visual grounding medical AI
Online Access:	https://ieeexplore.ieee.org/document/10820517/
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832540523290689536
author	Di Ding Tianliang Yao Rong Luo Xusen Sun
author_facet	Di Ding Tianliang Yao Rong Luo Xusen Sun
author_sort	Di Ding
collection	DOAJ
description	Visual Question Answering (VQA) in robotic surgery is rapidly becoming a pivotal technology in medical AI, addressing the complex challenge of interpreting multimodal surgical data to support real-time decision-making. This comprehensive review synthesizes key advancements in Surgical VQA, highlighting the integration of large language models (LLMs), multimodal fusion techniques, and visual grounding methods. By reviewing 62 key studies selected through a systematic search of major scientific databases, including IEEE Xplore, Google Scholar, SpringerLink, and PubMed, we trace the evolution of VQA systems and their application in surgical environments. Current limitations, including dataset scarcity, multimodal alignment challenges, and issues of interpretability, are critically examined. This survey aims to not only provide a structured overview of the field but also identify critical research gaps and propose future directions to enhance VQA systems for robotic surgery, with the ultimate goal of improving intraoperative performance and patient outcomes.
format	Article
id	doaj-art-465ca1a27e274e1b9e2153d5abd579de
institution	Kabale University
issn	2169-3536
language	English
publishDate	2025-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj-art-465ca1a27e274e1b9e2153d5abd579de2025-02-05T00:00:50ZengIEEEIEEE Access2169-35362025-01-01139473948410.1109/ACCESS.2024.352514510820517Visual Question Answering in Robotic Surgery: A Comprehensive ReviewDi Ding0https://orcid.org/0009-0005-2435-4019Tianliang Yao1https://orcid.org/0009-0000-7063-3880Rong Luo2https://orcid.org/0009-0000-9892-1315Xusen Sun3https://orcid.org/0009-0006-8607-1265Department of Cardiology, Qingpu Branch of Zhongshan Hospital, Fudan University, Shanghai, ChinaDepartment of Control Science and Engineering, College of Electronic and Information Engineering, Tongji University, Shanghai, ChinaDepartment of Cardiology, Qingpu Branch of Zhongshan Hospital, Fudan University, Shanghai, ChinaDepartment of Cardiology, Qingpu Branch of Zhongshan Hospital, Fudan University, Shanghai, ChinaVisual Question Answering (VQA) in robotic surgery is rapidly becoming a pivotal technology in medical AI, addressing the complex challenge of interpreting multimodal surgical data to support real-time decision-making. This comprehensive review synthesizes key advancements in Surgical VQA, highlighting the integration of large language models (LLMs), multimodal fusion techniques, and visual grounding methods. By reviewing 62 key studies selected through a systematic search of major scientific databases, including IEEE Xplore, Google Scholar, SpringerLink, and PubMed, we trace the evolution of VQA systems and their application in surgical environments. Current limitations, including dataset scarcity, multimodal alignment challenges, and issues of interpretability, are critically examined. This survey aims to not only provide a structured overview of the field but also identify critical research gaps and propose future directions to enhance VQA systems for robotic surgery, with the ultimate goal of improving intraoperative performance and patient outcomes.https://ieeexplore.ieee.org/document/10820517/Surgical visual question answeringmultimodal learningrobotic surgeryvisual groundingmedical AI
spellingShingle	Di Ding Tianliang Yao Rong Luo Xusen Sun Visual Question Answering in Robotic Surgery: A Comprehensive Review IEEE Access Surgical visual question answering multimodal learning robotic surgery visual grounding medical AI
title	Visual Question Answering in Robotic Surgery: A Comprehensive Review
title_full	Visual Question Answering in Robotic Surgery: A Comprehensive Review
title_fullStr	Visual Question Answering in Robotic Surgery: A Comprehensive Review
title_full_unstemmed	Visual Question Answering in Robotic Surgery: A Comprehensive Review
title_short	Visual Question Answering in Robotic Surgery: A Comprehensive Review
title_sort	visual question answering in robotic surgery a comprehensive review
topic	Surgical visual question answering multimodal learning robotic surgery visual grounding medical AI
url	https://ieeexplore.ieee.org/document/10820517/
work_keys_str_mv	AT diding visualquestionansweringinroboticsurgeryacomprehensivereview AT tianliangyao visualquestionansweringinroboticsurgeryacomprehensivereview AT rongluo visualquestionansweringinroboticsurgeryacomprehensivereview AT xusensun visualquestionansweringinroboticsurgeryacomprehensivereview

Visual Question Answering in Robotic Surgery: A Comprehensive Review

Similar Items