Coherent Interpretation of Entire Visual Field Test Reports Using a Multimodal Large Language Model (ChatGPT)

This study assesses the accuracy and consistency of a commercially available large language model (LLM) in extracting and interpreting sensitivity and reliability data from entire visual field (VF) test reports for the evaluation of glaucomatous defects. Single-page anonymised VF test reports from 6...

Full description

Saved in:
Bibliographic Details
Main Author: Jeremy C. K. Tan
Format: Article
Language:English
Published: MDPI AG 2025-04-01
Series:Vision
Subjects:
Online Access:https://www.mdpi.com/2411-5150/9/2/33
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849418224660119552
author Jeremy C. K. Tan
author_facet Jeremy C. K. Tan
author_sort Jeremy C. K. Tan
collection DOAJ
description This study assesses the accuracy and consistency of a commercially available large language model (LLM) in extracting and interpreting sensitivity and reliability data from entire visual field (VF) test reports for the evaluation of glaucomatous defects. Single-page anonymised VF test reports from 60 eyes of 60 subjects were analysed by an LLM (ChatGPT 4o) across four domains—test reliability, defect type, defect severity and overall diagnosis. The main outcome measures were accuracy of data extraction, interpretation of glaucomatous field defects and diagnostic classification. The LLM displayed 100% accuracy in the extraction of global sensitivity and reliability metrics and in classifying test reliability. It also demonstrated high accuracy (96.7%) in diagnosing whether the VF defect was consistent with a healthy, suspect or glaucomatous eye. The accuracy in correctly defining the type of defect was moderate (73.3%), which only partially improved when provided with a more defined region of interest. The causes of incorrect defect type were mostly attributed to the wrong location, particularly confusing the superior and inferior hemifields. Numerical/text-based data extraction and interpretation was overall notably superior to image-based interpretation of VF defects. This study demonstrates the potential and also limitations of multimodal LLMs in processing multimodal medical investigation data such as VF reports.
format Article
id doaj-art-05973bd4c9bd436995d9b2a9a6188002
institution Kabale University
issn 2411-5150
language English
publishDate 2025-04-01
publisher MDPI AG
record_format Article
series Vision
spelling doaj-art-05973bd4c9bd436995d9b2a9a61880022025-08-20T03:32:31ZengMDPI AGVision2411-51502025-04-01923310.3390/vision9020033Coherent Interpretation of Entire Visual Field Test Reports Using a Multimodal Large Language Model (ChatGPT)Jeremy C. K. Tan0Faculty of Medicine, University of New South Wales, Kensington, NSW 2033, AustraliaThis study assesses the accuracy and consistency of a commercially available large language model (LLM) in extracting and interpreting sensitivity and reliability data from entire visual field (VF) test reports for the evaluation of glaucomatous defects. Single-page anonymised VF test reports from 60 eyes of 60 subjects were analysed by an LLM (ChatGPT 4o) across four domains—test reliability, defect type, defect severity and overall diagnosis. The main outcome measures were accuracy of data extraction, interpretation of glaucomatous field defects and diagnostic classification. The LLM displayed 100% accuracy in the extraction of global sensitivity and reliability metrics and in classifying test reliability. It also demonstrated high accuracy (96.7%) in diagnosing whether the VF defect was consistent with a healthy, suspect or glaucomatous eye. The accuracy in correctly defining the type of defect was moderate (73.3%), which only partially improved when provided with a more defined region of interest. The causes of incorrect defect type were mostly attributed to the wrong location, particularly confusing the superior and inferior hemifields. Numerical/text-based data extraction and interpretation was overall notably superior to image-based interpretation of VF defects. This study demonstrates the potential and also limitations of multimodal LLMs in processing multimodal medical investigation data such as VF reports.https://www.mdpi.com/2411-5150/9/2/33large language modelvision language modelglaucomavisual field
spellingShingle Jeremy C. K. Tan
Coherent Interpretation of Entire Visual Field Test Reports Using a Multimodal Large Language Model (ChatGPT)
Vision
large language model
vision language model
glaucoma
visual field
title Coherent Interpretation of Entire Visual Field Test Reports Using a Multimodal Large Language Model (ChatGPT)
title_full Coherent Interpretation of Entire Visual Field Test Reports Using a Multimodal Large Language Model (ChatGPT)
title_fullStr Coherent Interpretation of Entire Visual Field Test Reports Using a Multimodal Large Language Model (ChatGPT)
title_full_unstemmed Coherent Interpretation of Entire Visual Field Test Reports Using a Multimodal Large Language Model (ChatGPT)
title_short Coherent Interpretation of Entire Visual Field Test Reports Using a Multimodal Large Language Model (ChatGPT)
title_sort coherent interpretation of entire visual field test reports using a multimodal large language model chatgpt
topic large language model
vision language model
glaucoma
visual field
url https://www.mdpi.com/2411-5150/9/2/33
work_keys_str_mv AT jeremycktan coherentinterpretationofentirevisualfieldtestreportsusingamultimodallargelanguagemodelchatgpt