Evaluating ChatGPT’s diagnostic potential for pathology images
BackgroundChat Generative Pretrained Transformer (ChatGPT) is a type of large language model (LLM) developed by OpenAI, known for its extensive knowledge base and interactive capabilities. These attributes make it a valuable tool in the medical field, particularly for tasks such as answering medical...
Saved in:
Main Authors: | , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Frontiers Media S.A.
2025-01-01
|
Series: | Frontiers in Medicine |
Subjects: | |
Online Access: | https://www.frontiersin.org/articles/10.3389/fmed.2024.1507203/full |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832590849685323776 |
---|---|
author | Liya Ding Lei Fan Lei Fan Miao Shen Miao Shen Yawen Wang Kaiqin Sheng Zijuan Zou Huimin An Zhinong Jiang |
author_facet | Liya Ding Lei Fan Lei Fan Miao Shen Miao Shen Yawen Wang Kaiqin Sheng Zijuan Zou Huimin An Zhinong Jiang |
author_sort | Liya Ding |
collection | DOAJ |
description | BackgroundChat Generative Pretrained Transformer (ChatGPT) is a type of large language model (LLM) developed by OpenAI, known for its extensive knowledge base and interactive capabilities. These attributes make it a valuable tool in the medical field, particularly for tasks such as answering medical questions, drafting clinical notes, and optimizing the generation of radiology reports. However, keeping accuracy in medical contexts is the biggest challenge to employing GPT-4 in a clinical setting. This study aims to investigate the accuracy of GPT-4, which can process both text and image inputs, in generating diagnoses from pathological images.MethodsThis study analyzed 44 histopathological images from 16 organs and 100 colorectal biopsy photomicrographs. The initial evaluation was conducted using the standard GPT-4 model in January 2024, with a subsequent re-evaluation performed in July 2024. The diagnostic accuracy of GPT-4 was assessed by comparing its outputs to a reference standard using statistical measures. Additionally, four pathologists independently reviewed the same images to compare their diagnoses with the model’s outputs. Both scanned and photographed images were tested to evaluate GPT-4’s generalization ability across different image types.ResultsGPT-4 achieved an overall accuracy of 0.64 in identifying tumor imaging and tissue origins. For colon polyp classification, accuracy varied from 0.57 to 0.75 in different subtypes. The model achieved 0.88 accuracy in distinguishing low-grade from high-grade dysplasia and 0.75 in distinguishing high-grade dysplasia from adenocarcinoma, with a high sensitivity in detecting adenocarcinoma. Consistency between initial and follow-up evaluations showed slight to moderate agreement, with Kappa values ranging from 0.204 to 0.375.ConclusionGPT-4 demonstrates the ability to diagnose pathological images, showing improved performance over earlier versions. Its diagnostic accuracy in cancer is comparable to that of pathology residents. These findings suggest that GPT-4 holds promise as a supportive tool in pathology diagnostics, offering the potential to assist pathologists in routine diagnostic workflows. |
format | Article |
id | doaj-art-df711b8c5025482e81d54e0969652d0a |
institution | Kabale University |
issn | 2296-858X |
language | English |
publishDate | 2025-01-01 |
publisher | Frontiers Media S.A. |
record_format | Article |
series | Frontiers in Medicine |
spelling | doaj-art-df711b8c5025482e81d54e0969652d0a2025-01-23T06:56:17ZengFrontiers Media S.A.Frontiers in Medicine2296-858X2025-01-011110.3389/fmed.2024.15072031507203Evaluating ChatGPT’s diagnostic potential for pathology imagesLiya Ding0Lei Fan1Lei Fan2Miao Shen3Miao Shen4Yawen Wang5Kaiqin Sheng6Zijuan Zou7Huimin An8Zhinong Jiang9Department of Pathology, Sir Run Run Shaw Hospital, Zhejiang University School of Medicine, Hangzhou, ChinaDepartment of Pathology, Sir Run Run Shaw Hospital, Zhejiang University School of Medicine, Hangzhou, ChinaDepartment of Pathology, Ninghai County Traditional Chinese Medicine Hospital, Ningbo, ChinaDepartment of Pathology, Sir Run Run Shaw Hospital, Zhejiang University School of Medicine, Hangzhou, ChinaDepartment of Pathology, Deqing People’s Hospital, Hangzhou, ChinaCollege of Biomedical Engineering and Instrument Science, Zhejiang University, Hangzhou, ChinaDepartment of Pathology, Sir Run Run Shaw Hospital, Zhejiang University School of Medicine, Hangzhou, ChinaDepartment of Pathology, Sir Run Run Shaw Hospital, Zhejiang University School of Medicine, Hangzhou, ChinaDepartment of Pathology, Sir Run Run Shaw Hospital, Zhejiang University School of Medicine, Hangzhou, ChinaDepartment of Pathology, Sir Run Run Shaw Hospital, Zhejiang University School of Medicine, Hangzhou, ChinaBackgroundChat Generative Pretrained Transformer (ChatGPT) is a type of large language model (LLM) developed by OpenAI, known for its extensive knowledge base and interactive capabilities. These attributes make it a valuable tool in the medical field, particularly for tasks such as answering medical questions, drafting clinical notes, and optimizing the generation of radiology reports. However, keeping accuracy in medical contexts is the biggest challenge to employing GPT-4 in a clinical setting. This study aims to investigate the accuracy of GPT-4, which can process both text and image inputs, in generating diagnoses from pathological images.MethodsThis study analyzed 44 histopathological images from 16 organs and 100 colorectal biopsy photomicrographs. The initial evaluation was conducted using the standard GPT-4 model in January 2024, with a subsequent re-evaluation performed in July 2024. The diagnostic accuracy of GPT-4 was assessed by comparing its outputs to a reference standard using statistical measures. Additionally, four pathologists independently reviewed the same images to compare their diagnoses with the model’s outputs. Both scanned and photographed images were tested to evaluate GPT-4’s generalization ability across different image types.ResultsGPT-4 achieved an overall accuracy of 0.64 in identifying tumor imaging and tissue origins. For colon polyp classification, accuracy varied from 0.57 to 0.75 in different subtypes. The model achieved 0.88 accuracy in distinguishing low-grade from high-grade dysplasia and 0.75 in distinguishing high-grade dysplasia from adenocarcinoma, with a high sensitivity in detecting adenocarcinoma. Consistency between initial and follow-up evaluations showed slight to moderate agreement, with Kappa values ranging from 0.204 to 0.375.ConclusionGPT-4 demonstrates the ability to diagnose pathological images, showing improved performance over earlier versions. Its diagnostic accuracy in cancer is comparable to that of pathology residents. These findings suggest that GPT-4 holds promise as a supportive tool in pathology diagnostics, offering the potential to assist pathologists in routine diagnostic workflows.https://www.frontiersin.org/articles/10.3389/fmed.2024.1507203/fulllarge language modelChatGPTpathology imagescolon polypdiagnosis |
spellingShingle | Liya Ding Lei Fan Lei Fan Miao Shen Miao Shen Yawen Wang Kaiqin Sheng Zijuan Zou Huimin An Zhinong Jiang Evaluating ChatGPT’s diagnostic potential for pathology images Frontiers in Medicine large language model ChatGPT pathology images colon polyp diagnosis |
title | Evaluating ChatGPT’s diagnostic potential for pathology images |
title_full | Evaluating ChatGPT’s diagnostic potential for pathology images |
title_fullStr | Evaluating ChatGPT’s diagnostic potential for pathology images |
title_full_unstemmed | Evaluating ChatGPT’s diagnostic potential for pathology images |
title_short | Evaluating ChatGPT’s diagnostic potential for pathology images |
title_sort | evaluating chatgpt s diagnostic potential for pathology images |
topic | large language model ChatGPT pathology images colon polyp diagnosis |
url | https://www.frontiersin.org/articles/10.3389/fmed.2024.1507203/full |
work_keys_str_mv | AT liyading evaluatingchatgptsdiagnosticpotentialforpathologyimages AT leifan evaluatingchatgptsdiagnosticpotentialforpathologyimages AT leifan evaluatingchatgptsdiagnosticpotentialforpathologyimages AT miaoshen evaluatingchatgptsdiagnosticpotentialforpathologyimages AT miaoshen evaluatingchatgptsdiagnosticpotentialforpathologyimages AT yawenwang evaluatingchatgptsdiagnosticpotentialforpathologyimages AT kaiqinsheng evaluatingchatgptsdiagnosticpotentialforpathologyimages AT zijuanzou evaluatingchatgptsdiagnosticpotentialforpathologyimages AT huiminan evaluatingchatgptsdiagnosticpotentialforpathologyimages AT zhinongjiang evaluatingchatgptsdiagnosticpotentialforpathologyimages |