The Potential Clinical Utility of the Customized Large Language Model in Gastroenterology: A Pilot Study

<b>Background:</b> The large language model (LLM) has the potential to be applied to clinical practice. However, there has been scarce study on this in the field of gastroenterology. Aim: This study explores the potential clinical utility of two LLMs in the field of gastroenterology: a c...

Full description

Saved in:
Bibliographic Details
Main Authors: Eun Jeong Gong, Chang Seok Bang, Jae Jun Lee, Jonghyung Park, Eunsil Kim, Subeen Kim, Minjae Kimm, Seoung-Ho Choi
Format: Article
Language:English
Published: MDPI AG 2024-12-01
Series:Bioengineering
Subjects:
Online Access:https://www.mdpi.com/2306-5354/12/1/1
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832589029827149824
author Eun Jeong Gong
Chang Seok Bang
Jae Jun Lee
Jonghyung Park
Eunsil Kim
Subeen Kim
Minjae Kimm
Seoung-Ho Choi
author_facet Eun Jeong Gong
Chang Seok Bang
Jae Jun Lee
Jonghyung Park
Eunsil Kim
Subeen Kim
Minjae Kimm
Seoung-Ho Choi
author_sort Eun Jeong Gong
collection DOAJ
description <b>Background:</b> The large language model (LLM) has the potential to be applied to clinical practice. However, there has been scarce study on this in the field of gastroenterology. Aim: This study explores the potential clinical utility of two LLMs in the field of gastroenterology: a customized GPT model and a conventional GPT-4o, an advanced LLM capable of retrieval-augmented generation (RAG). <b>Method:</b> We established a customized GPT with the BM25 algorithm using Open AI’s GPT-4o model, which allows it to produce responses in the context of specific documents including textbooks of internal medicine (in English) and gastroenterology (in Korean). Also, we prepared a conventional ChatGPT 4o (accessed on 16 October 2024) access. The benchmark (written in Korean) consisted of 15 clinical questions developed by four clinical experts, representing typical questions for medical students. The two LLMs, a gastroenterology fellow, and an expert gastroenterologist were tested to assess their performance. <b>Results:</b> While the customized LLM correctly answered 8 out of 15 questions, the fellow answered 10 correctly. When the standardized Korean medical terms were replaced with English terminology, the LLM’s performance improved, answering two additional knowledge-based questions correctly, matching the fellow’s score. However, judgment-based questions remained a challenge for the model. Even with the implementation of ‘Chain of Thought’ prompt engineering, the customized GPT did not achieve improved reasoning. Conventional GPT-4o achieved the highest score among the AI models (14/15). Although both models performed slightly below the expert gastroenterologist’s level (15/15), they show promising potential for clinical applications (scores comparable with or higher than that of the gastroenterology fellow). <b>Conclusions:</b> LLMs could be utilized to assist with specialized tasks such as patient counseling. However, RAG capabilities by enabling real-time retrieval of external data not included in the training dataset, appear essential for managing complex, specialized content, and clinician oversight will remain crucial to ensure safe and effective use in clinical practice.
format Article
id doaj-art-20b736aa29e344ec98f5dc3bd81f9d87
institution Kabale University
issn 2306-5354
language English
publishDate 2024-12-01
publisher MDPI AG
record_format Article
series Bioengineering
spelling doaj-art-20b736aa29e344ec98f5dc3bd81f9d872025-01-24T13:22:54ZengMDPI AGBioengineering2306-53542024-12-01121110.3390/bioengineering12010001The Potential Clinical Utility of the Customized Large Language Model in Gastroenterology: A Pilot StudyEun Jeong Gong0Chang Seok Bang1Jae Jun Lee2Jonghyung Park3Eunsil Kim4Subeen Kim5Minjae Kimm6Seoung-Ho Choi7Department of Internal Medicine, Hallym University College of Medicine, Chuncheon 24253, Republic of KoreaDepartment of Internal Medicine, Hallym University College of Medicine, Chuncheon 24253, Republic of KoreaInstitute of New Frontier Research, Hallym University College of Medicine, Chuncheon 24253, Republic of KoreaMeninblox, Inc., Gwangju 61008, Republic of KoreaMeninblox, Inc., Gwangju 61008, Republic of KoreaMeninblox, Inc., Gwangju 61008, Republic of KoreaDepartment of Plastic Art, Tech University of Korea, Siheung 15073, Republic of KoreaCollege of Liberal Arts Faculty of Basic Liberal Art, Hansung University, Seoul 02876, Republic of Korea<b>Background:</b> The large language model (LLM) has the potential to be applied to clinical practice. However, there has been scarce study on this in the field of gastroenterology. Aim: This study explores the potential clinical utility of two LLMs in the field of gastroenterology: a customized GPT model and a conventional GPT-4o, an advanced LLM capable of retrieval-augmented generation (RAG). <b>Method:</b> We established a customized GPT with the BM25 algorithm using Open AI’s GPT-4o model, which allows it to produce responses in the context of specific documents including textbooks of internal medicine (in English) and gastroenterology (in Korean). Also, we prepared a conventional ChatGPT 4o (accessed on 16 October 2024) access. The benchmark (written in Korean) consisted of 15 clinical questions developed by four clinical experts, representing typical questions for medical students. The two LLMs, a gastroenterology fellow, and an expert gastroenterologist were tested to assess their performance. <b>Results:</b> While the customized LLM correctly answered 8 out of 15 questions, the fellow answered 10 correctly. When the standardized Korean medical terms were replaced with English terminology, the LLM’s performance improved, answering two additional knowledge-based questions correctly, matching the fellow’s score. However, judgment-based questions remained a challenge for the model. Even with the implementation of ‘Chain of Thought’ prompt engineering, the customized GPT did not achieve improved reasoning. Conventional GPT-4o achieved the highest score among the AI models (14/15). Although both models performed slightly below the expert gastroenterologist’s level (15/15), they show promising potential for clinical applications (scores comparable with or higher than that of the gastroenterology fellow). <b>Conclusions:</b> LLMs could be utilized to assist with specialized tasks such as patient counseling. However, RAG capabilities by enabling real-time retrieval of external data not included in the training dataset, appear essential for managing complex, specialized content, and clinician oversight will remain crucial to ensure safe and effective use in clinical practice.https://www.mdpi.com/2306-5354/12/1/1large language modelartificial intelligencegastroenterology
spellingShingle Eun Jeong Gong
Chang Seok Bang
Jae Jun Lee
Jonghyung Park
Eunsil Kim
Subeen Kim
Minjae Kimm
Seoung-Ho Choi
The Potential Clinical Utility of the Customized Large Language Model in Gastroenterology: A Pilot Study
Bioengineering
large language model
artificial intelligence
gastroenterology
title The Potential Clinical Utility of the Customized Large Language Model in Gastroenterology: A Pilot Study
title_full The Potential Clinical Utility of the Customized Large Language Model in Gastroenterology: A Pilot Study
title_fullStr The Potential Clinical Utility of the Customized Large Language Model in Gastroenterology: A Pilot Study
title_full_unstemmed The Potential Clinical Utility of the Customized Large Language Model in Gastroenterology: A Pilot Study
title_short The Potential Clinical Utility of the Customized Large Language Model in Gastroenterology: A Pilot Study
title_sort potential clinical utility of the customized large language model in gastroenterology a pilot study
topic large language model
artificial intelligence
gastroenterology
url https://www.mdpi.com/2306-5354/12/1/1
work_keys_str_mv AT eunjeonggong thepotentialclinicalutilityofthecustomizedlargelanguagemodelingastroenterologyapilotstudy
AT changseokbang thepotentialclinicalutilityofthecustomizedlargelanguagemodelingastroenterologyapilotstudy
AT jaejunlee thepotentialclinicalutilityofthecustomizedlargelanguagemodelingastroenterologyapilotstudy
AT jonghyungpark thepotentialclinicalutilityofthecustomizedlargelanguagemodelingastroenterologyapilotstudy
AT eunsilkim thepotentialclinicalutilityofthecustomizedlargelanguagemodelingastroenterologyapilotstudy
AT subeenkim thepotentialclinicalutilityofthecustomizedlargelanguagemodelingastroenterologyapilotstudy
AT minjaekimm thepotentialclinicalutilityofthecustomizedlargelanguagemodelingastroenterologyapilotstudy
AT seounghochoi thepotentialclinicalutilityofthecustomizedlargelanguagemodelingastroenterologyapilotstudy
AT eunjeonggong potentialclinicalutilityofthecustomizedlargelanguagemodelingastroenterologyapilotstudy
AT changseokbang potentialclinicalutilityofthecustomizedlargelanguagemodelingastroenterologyapilotstudy
AT jaejunlee potentialclinicalutilityofthecustomizedlargelanguagemodelingastroenterologyapilotstudy
AT jonghyungpark potentialclinicalutilityofthecustomizedlargelanguagemodelingastroenterologyapilotstudy
AT eunsilkim potentialclinicalutilityofthecustomizedlargelanguagemodelingastroenterologyapilotstudy
AT subeenkim potentialclinicalutilityofthecustomizedlargelanguagemodelingastroenterologyapilotstudy
AT minjaekimm potentialclinicalutilityofthecustomizedlargelanguagemodelingastroenterologyapilotstudy
AT seounghochoi potentialclinicalutilityofthecustomizedlargelanguagemodelingastroenterologyapilotstudy