Evaluating ChatGPT’s diagnostic potential for pathology images

BackgroundChat Generative Pretrained Transformer (ChatGPT) is a type of large language model (LLM) developed by OpenAI, known for its extensive knowledge base and interactive capabilities. These attributes make it a valuable tool in the medical field, particularly for tasks such as answering medical...

Full description

Saved in:
Bibliographic Details
Main Authors: Liya Ding, Lei Fan, Miao Shen, Yawen Wang, Kaiqin Sheng, Zijuan Zou, Huimin An, Zhinong Jiang
Format: Article
Language:English
Published: Frontiers Media S.A. 2025-01-01
Series:Frontiers in Medicine
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fmed.2024.1507203/full
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832590849685323776
author Liya Ding
Lei Fan
Lei Fan
Miao Shen
Miao Shen
Yawen Wang
Kaiqin Sheng
Zijuan Zou
Huimin An
Zhinong Jiang
author_facet Liya Ding
Lei Fan
Lei Fan
Miao Shen
Miao Shen
Yawen Wang
Kaiqin Sheng
Zijuan Zou
Huimin An
Zhinong Jiang
author_sort Liya Ding
collection DOAJ
description BackgroundChat Generative Pretrained Transformer (ChatGPT) is a type of large language model (LLM) developed by OpenAI, known for its extensive knowledge base and interactive capabilities. These attributes make it a valuable tool in the medical field, particularly for tasks such as answering medical questions, drafting clinical notes, and optimizing the generation of radiology reports. However, keeping accuracy in medical contexts is the biggest challenge to employing GPT-4 in a clinical setting. This study aims to investigate the accuracy of GPT-4, which can process both text and image inputs, in generating diagnoses from pathological images.MethodsThis study analyzed 44 histopathological images from 16 organs and 100 colorectal biopsy photomicrographs. The initial evaluation was conducted using the standard GPT-4 model in January 2024, with a subsequent re-evaluation performed in July 2024. The diagnostic accuracy of GPT-4 was assessed by comparing its outputs to a reference standard using statistical measures. Additionally, four pathologists independently reviewed the same images to compare their diagnoses with the model’s outputs. Both scanned and photographed images were tested to evaluate GPT-4’s generalization ability across different image types.ResultsGPT-4 achieved an overall accuracy of 0.64 in identifying tumor imaging and tissue origins. For colon polyp classification, accuracy varied from 0.57 to 0.75 in different subtypes. The model achieved 0.88 accuracy in distinguishing low-grade from high-grade dysplasia and 0.75 in distinguishing high-grade dysplasia from adenocarcinoma, with a high sensitivity in detecting adenocarcinoma. Consistency between initial and follow-up evaluations showed slight to moderate agreement, with Kappa values ranging from 0.204 to 0.375.ConclusionGPT-4 demonstrates the ability to diagnose pathological images, showing improved performance over earlier versions. Its diagnostic accuracy in cancer is comparable to that of pathology residents. These findings suggest that GPT-4 holds promise as a supportive tool in pathology diagnostics, offering the potential to assist pathologists in routine diagnostic workflows.
format Article
id doaj-art-df711b8c5025482e81d54e0969652d0a
institution Kabale University
issn 2296-858X
language English
publishDate 2025-01-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Medicine
spelling doaj-art-df711b8c5025482e81d54e0969652d0a2025-01-23T06:56:17ZengFrontiers Media S.A.Frontiers in Medicine2296-858X2025-01-011110.3389/fmed.2024.15072031507203Evaluating ChatGPT’s diagnostic potential for pathology imagesLiya Ding0Lei Fan1Lei Fan2Miao Shen3Miao Shen4Yawen Wang5Kaiqin Sheng6Zijuan Zou7Huimin An8Zhinong Jiang9Department of Pathology, Sir Run Run Shaw Hospital, Zhejiang University School of Medicine, Hangzhou, ChinaDepartment of Pathology, Sir Run Run Shaw Hospital, Zhejiang University School of Medicine, Hangzhou, ChinaDepartment of Pathology, Ninghai County Traditional Chinese Medicine Hospital, Ningbo, ChinaDepartment of Pathology, Sir Run Run Shaw Hospital, Zhejiang University School of Medicine, Hangzhou, ChinaDepartment of Pathology, Deqing People’s Hospital, Hangzhou, ChinaCollege of Biomedical Engineering and Instrument Science, Zhejiang University, Hangzhou, ChinaDepartment of Pathology, Sir Run Run Shaw Hospital, Zhejiang University School of Medicine, Hangzhou, ChinaDepartment of Pathology, Sir Run Run Shaw Hospital, Zhejiang University School of Medicine, Hangzhou, ChinaDepartment of Pathology, Sir Run Run Shaw Hospital, Zhejiang University School of Medicine, Hangzhou, ChinaDepartment of Pathology, Sir Run Run Shaw Hospital, Zhejiang University School of Medicine, Hangzhou, ChinaBackgroundChat Generative Pretrained Transformer (ChatGPT) is a type of large language model (LLM) developed by OpenAI, known for its extensive knowledge base and interactive capabilities. These attributes make it a valuable tool in the medical field, particularly for tasks such as answering medical questions, drafting clinical notes, and optimizing the generation of radiology reports. However, keeping accuracy in medical contexts is the biggest challenge to employing GPT-4 in a clinical setting. This study aims to investigate the accuracy of GPT-4, which can process both text and image inputs, in generating diagnoses from pathological images.MethodsThis study analyzed 44 histopathological images from 16 organs and 100 colorectal biopsy photomicrographs. The initial evaluation was conducted using the standard GPT-4 model in January 2024, with a subsequent re-evaluation performed in July 2024. The diagnostic accuracy of GPT-4 was assessed by comparing its outputs to a reference standard using statistical measures. Additionally, four pathologists independently reviewed the same images to compare their diagnoses with the model’s outputs. Both scanned and photographed images were tested to evaluate GPT-4’s generalization ability across different image types.ResultsGPT-4 achieved an overall accuracy of 0.64 in identifying tumor imaging and tissue origins. For colon polyp classification, accuracy varied from 0.57 to 0.75 in different subtypes. The model achieved 0.88 accuracy in distinguishing low-grade from high-grade dysplasia and 0.75 in distinguishing high-grade dysplasia from adenocarcinoma, with a high sensitivity in detecting adenocarcinoma. Consistency between initial and follow-up evaluations showed slight to moderate agreement, with Kappa values ranging from 0.204 to 0.375.ConclusionGPT-4 demonstrates the ability to diagnose pathological images, showing improved performance over earlier versions. Its diagnostic accuracy in cancer is comparable to that of pathology residents. These findings suggest that GPT-4 holds promise as a supportive tool in pathology diagnostics, offering the potential to assist pathologists in routine diagnostic workflows.https://www.frontiersin.org/articles/10.3389/fmed.2024.1507203/fulllarge language modelChatGPTpathology imagescolon polypdiagnosis
spellingShingle Liya Ding
Lei Fan
Lei Fan
Miao Shen
Miao Shen
Yawen Wang
Kaiqin Sheng
Zijuan Zou
Huimin An
Zhinong Jiang
Evaluating ChatGPT’s diagnostic potential for pathology images
Frontiers in Medicine
large language model
ChatGPT
pathology images
colon polyp
diagnosis
title Evaluating ChatGPT’s diagnostic potential for pathology images
title_full Evaluating ChatGPT’s diagnostic potential for pathology images
title_fullStr Evaluating ChatGPT’s diagnostic potential for pathology images
title_full_unstemmed Evaluating ChatGPT’s diagnostic potential for pathology images
title_short Evaluating ChatGPT’s diagnostic potential for pathology images
title_sort evaluating chatgpt s diagnostic potential for pathology images
topic large language model
ChatGPT
pathology images
colon polyp
diagnosis
url https://www.frontiersin.org/articles/10.3389/fmed.2024.1507203/full
work_keys_str_mv AT liyading evaluatingchatgptsdiagnosticpotentialforpathologyimages
AT leifan evaluatingchatgptsdiagnosticpotentialforpathologyimages
AT leifan evaluatingchatgptsdiagnosticpotentialforpathologyimages
AT miaoshen evaluatingchatgptsdiagnosticpotentialforpathologyimages
AT miaoshen evaluatingchatgptsdiagnosticpotentialforpathologyimages
AT yawenwang evaluatingchatgptsdiagnosticpotentialforpathologyimages
AT kaiqinsheng evaluatingchatgptsdiagnosticpotentialforpathologyimages
AT zijuanzou evaluatingchatgptsdiagnosticpotentialforpathologyimages
AT huiminan evaluatingchatgptsdiagnosticpotentialforpathologyimages
AT zhinongjiang evaluatingchatgptsdiagnosticpotentialforpathologyimages