Comparative Analysis of ChatGPT and Human Expertise in Diagnosing Primary Liver Carcinoma: A Focus on Gross Morphology

Objective: This study aims to compare the diagnostic accuracy of customized ChatGPT and human experts in identifying primary liver carcinoma using gross morphology. Materials and Methods: Gross morphology images of hepatocellular carcinoma (HCC) and cholangiocarcinoma (CCA) cases were assessed. T...

Full description

Saved in:
Bibliographic Details
Main Authors: Prakasit Sa-ngiamwibool, Thiyaphat Laohawetwanit
Format: Article
Language:English
Published: Faculty of Medicine Siriraj Hospital 2025-02-01
Series:Siriraj Medical Journal
Subjects:
Online Access:https://he02.tci-thaijo.org/index.php/sirirajmedj/article/view/271596
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832545213269147648
author Prakasit Sa-ngiamwibool
Thiyaphat Laohawetwanit
author_facet Prakasit Sa-ngiamwibool
Thiyaphat Laohawetwanit
author_sort Prakasit Sa-ngiamwibool
collection DOAJ
description Objective: This study aims to compare the diagnostic accuracy of customized ChatGPT and human experts in identifying primary liver carcinoma using gross morphology. Materials and Methods: Gross morphology images of hepatocellular carcinoma (HCC) and cholangiocarcinoma (CCA) cases were assessed. These images were analyzed by two versions of customized ChatGPT (e.g., with and without a scoring system), pathology residents, and pathologist assistants. The diagnostic accuracy and consistency of each participant group were evaluated. Results: The study analyzed 128 liver carcinoma images (62 HCC, 66 CCA) were analyzed, with the participation of 13 pathology residents (median experience of 1.5 years) and three pathologist assistants (median experience of 5 years). When augmented with a scoring system, ChatGPT’s performance was found to align closely with first- and second-year pathology residents and was inferior to third-year pathology residents and pathologist assistants, with statistical significance (p-values < 0.01). In contrast, the diagnostic accuracy of ChatGPT, when operating without the scoring system, was significantly lower than that of all human participants (p-values < 0.01). Kappa statistics indicated that the diagnostic consistency was slight to fair for both customized versions of ChatGPT and the pathology residents. It was noted that the interobserver agreement among the pathologist assistants was moderate. Conclusion: The study highlights the potential of ChatGPT for augmenting diagnostic processes in pathology. However, it also emphasizes the current limitations of this AI tool compared to human expertise, particularly among experienced participants. This suggests the importance of integrating AI with human judgment in diagnostic pathology.
format Article
id doaj-art-114bf4415b4d48d88257430e826016b4
institution Kabale University
issn 2228-8082
language English
publishDate 2025-02-01
publisher Faculty of Medicine Siriraj Hospital
record_format Article
series Siriraj Medical Journal
spelling doaj-art-114bf4415b4d48d88257430e826016b42025-02-03T07:37:10ZengFaculty of Medicine Siriraj HospitalSiriraj Medical Journal2228-80822025-02-0177210.33192/smj.v77i2.271596Comparative Analysis of ChatGPT and Human Expertise in Diagnosing Primary Liver Carcinoma: A Focus on Gross MorphologyPrakasit Sa-ngiamwibool0Thiyaphat Laohawetwanit1Department of Pathology, Faculty of Medicine, Khon Kaen University, Khon Kaen, ThailandDivision of Pathology, Chulabhorn International College of Medicine, Thammasat University, Pathum Thani, Thailand Objective: This study aims to compare the diagnostic accuracy of customized ChatGPT and human experts in identifying primary liver carcinoma using gross morphology. Materials and Methods: Gross morphology images of hepatocellular carcinoma (HCC) and cholangiocarcinoma (CCA) cases were assessed. These images were analyzed by two versions of customized ChatGPT (e.g., with and without a scoring system), pathology residents, and pathologist assistants. The diagnostic accuracy and consistency of each participant group were evaluated. Results: The study analyzed 128 liver carcinoma images (62 HCC, 66 CCA) were analyzed, with the participation of 13 pathology residents (median experience of 1.5 years) and three pathologist assistants (median experience of 5 years). When augmented with a scoring system, ChatGPT’s performance was found to align closely with first- and second-year pathology residents and was inferior to third-year pathology residents and pathologist assistants, with statistical significance (p-values < 0.01). In contrast, the diagnostic accuracy of ChatGPT, when operating without the scoring system, was significantly lower than that of all human participants (p-values < 0.01). Kappa statistics indicated that the diagnostic consistency was slight to fair for both customized versions of ChatGPT and the pathology residents. It was noted that the interobserver agreement among the pathologist assistants was moderate. Conclusion: The study highlights the potential of ChatGPT for augmenting diagnostic processes in pathology. However, it also emphasizes the current limitations of this AI tool compared to human expertise, particularly among experienced participants. This suggests the importance of integrating AI with human judgment in diagnostic pathology. https://he02.tci-thaijo.org/index.php/sirirajmedj/article/view/271596Artificial intelligenceChatGPTGPT-4Liver cancerHepatocellular carcinomaCholangiocarcinoma
spellingShingle Prakasit Sa-ngiamwibool
Thiyaphat Laohawetwanit
Comparative Analysis of ChatGPT and Human Expertise in Diagnosing Primary Liver Carcinoma: A Focus on Gross Morphology
Siriraj Medical Journal
Artificial intelligence
ChatGPT
GPT-4
Liver cancer
Hepatocellular carcinoma
Cholangiocarcinoma
title Comparative Analysis of ChatGPT and Human Expertise in Diagnosing Primary Liver Carcinoma: A Focus on Gross Morphology
title_full Comparative Analysis of ChatGPT and Human Expertise in Diagnosing Primary Liver Carcinoma: A Focus on Gross Morphology
title_fullStr Comparative Analysis of ChatGPT and Human Expertise in Diagnosing Primary Liver Carcinoma: A Focus on Gross Morphology
title_full_unstemmed Comparative Analysis of ChatGPT and Human Expertise in Diagnosing Primary Liver Carcinoma: A Focus on Gross Morphology
title_short Comparative Analysis of ChatGPT and Human Expertise in Diagnosing Primary Liver Carcinoma: A Focus on Gross Morphology
title_sort comparative analysis of chatgpt and human expertise in diagnosing primary liver carcinoma a focus on gross morphology
topic Artificial intelligence
ChatGPT
GPT-4
Liver cancer
Hepatocellular carcinoma
Cholangiocarcinoma
url https://he02.tci-thaijo.org/index.php/sirirajmedj/article/view/271596
work_keys_str_mv AT prakasitsangiamwibool comparativeanalysisofchatgptandhumanexpertiseindiagnosingprimarylivercarcinomaafocusongrossmorphology
AT thiyaphatlaohawetwanit comparativeanalysisofchatgptandhumanexpertiseindiagnosingprimarylivercarcinomaafocusongrossmorphology