The Potential Clinical Utility of the Customized Large Language Model in Gastroenterology: A Pilot Study
<b>Background:</b> The large language model (LLM) has the potential to be applied to clinical practice. However, there has been scarce study on this in the field of gastroenterology. Aim: This study explores the potential clinical utility of two LLMs in the field of gastroenterology: a c...
Saved in:
Main Authors: | , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2024-12-01
|
Series: | Bioengineering |
Subjects: | |
Online Access: | https://www.mdpi.com/2306-5354/12/1/1 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832589029827149824 |
---|---|
author | Eun Jeong Gong Chang Seok Bang Jae Jun Lee Jonghyung Park Eunsil Kim Subeen Kim Minjae Kimm Seoung-Ho Choi |
author_facet | Eun Jeong Gong Chang Seok Bang Jae Jun Lee Jonghyung Park Eunsil Kim Subeen Kim Minjae Kimm Seoung-Ho Choi |
author_sort | Eun Jeong Gong |
collection | DOAJ |
description | <b>Background:</b> The large language model (LLM) has the potential to be applied to clinical practice. However, there has been scarce study on this in the field of gastroenterology. Aim: This study explores the potential clinical utility of two LLMs in the field of gastroenterology: a customized GPT model and a conventional GPT-4o, an advanced LLM capable of retrieval-augmented generation (RAG). <b>Method:</b> We established a customized GPT with the BM25 algorithm using Open AI’s GPT-4o model, which allows it to produce responses in the context of specific documents including textbooks of internal medicine (in English) and gastroenterology (in Korean). Also, we prepared a conventional ChatGPT 4o (accessed on 16 October 2024) access. The benchmark (written in Korean) consisted of 15 clinical questions developed by four clinical experts, representing typical questions for medical students. The two LLMs, a gastroenterology fellow, and an expert gastroenterologist were tested to assess their performance. <b>Results:</b> While the customized LLM correctly answered 8 out of 15 questions, the fellow answered 10 correctly. When the standardized Korean medical terms were replaced with English terminology, the LLM’s performance improved, answering two additional knowledge-based questions correctly, matching the fellow’s score. However, judgment-based questions remained a challenge for the model. Even with the implementation of ‘Chain of Thought’ prompt engineering, the customized GPT did not achieve improved reasoning. Conventional GPT-4o achieved the highest score among the AI models (14/15). Although both models performed slightly below the expert gastroenterologist’s level (15/15), they show promising potential for clinical applications (scores comparable with or higher than that of the gastroenterology fellow). <b>Conclusions:</b> LLMs could be utilized to assist with specialized tasks such as patient counseling. However, RAG capabilities by enabling real-time retrieval of external data not included in the training dataset, appear essential for managing complex, specialized content, and clinician oversight will remain crucial to ensure safe and effective use in clinical practice. |
format | Article |
id | doaj-art-20b736aa29e344ec98f5dc3bd81f9d87 |
institution | Kabale University |
issn | 2306-5354 |
language | English |
publishDate | 2024-12-01 |
publisher | MDPI AG |
record_format | Article |
series | Bioengineering |
spelling | doaj-art-20b736aa29e344ec98f5dc3bd81f9d872025-01-24T13:22:54ZengMDPI AGBioengineering2306-53542024-12-01121110.3390/bioengineering12010001The Potential Clinical Utility of the Customized Large Language Model in Gastroenterology: A Pilot StudyEun Jeong Gong0Chang Seok Bang1Jae Jun Lee2Jonghyung Park3Eunsil Kim4Subeen Kim5Minjae Kimm6Seoung-Ho Choi7Department of Internal Medicine, Hallym University College of Medicine, Chuncheon 24253, Republic of KoreaDepartment of Internal Medicine, Hallym University College of Medicine, Chuncheon 24253, Republic of KoreaInstitute of New Frontier Research, Hallym University College of Medicine, Chuncheon 24253, Republic of KoreaMeninblox, Inc., Gwangju 61008, Republic of KoreaMeninblox, Inc., Gwangju 61008, Republic of KoreaMeninblox, Inc., Gwangju 61008, Republic of KoreaDepartment of Plastic Art, Tech University of Korea, Siheung 15073, Republic of KoreaCollege of Liberal Arts Faculty of Basic Liberal Art, Hansung University, Seoul 02876, Republic of Korea<b>Background:</b> The large language model (LLM) has the potential to be applied to clinical practice. However, there has been scarce study on this in the field of gastroenterology. Aim: This study explores the potential clinical utility of two LLMs in the field of gastroenterology: a customized GPT model and a conventional GPT-4o, an advanced LLM capable of retrieval-augmented generation (RAG). <b>Method:</b> We established a customized GPT with the BM25 algorithm using Open AI’s GPT-4o model, which allows it to produce responses in the context of specific documents including textbooks of internal medicine (in English) and gastroenterology (in Korean). Also, we prepared a conventional ChatGPT 4o (accessed on 16 October 2024) access. The benchmark (written in Korean) consisted of 15 clinical questions developed by four clinical experts, representing typical questions for medical students. The two LLMs, a gastroenterology fellow, and an expert gastroenterologist were tested to assess their performance. <b>Results:</b> While the customized LLM correctly answered 8 out of 15 questions, the fellow answered 10 correctly. When the standardized Korean medical terms were replaced with English terminology, the LLM’s performance improved, answering two additional knowledge-based questions correctly, matching the fellow’s score. However, judgment-based questions remained a challenge for the model. Even with the implementation of ‘Chain of Thought’ prompt engineering, the customized GPT did not achieve improved reasoning. Conventional GPT-4o achieved the highest score among the AI models (14/15). Although both models performed slightly below the expert gastroenterologist’s level (15/15), they show promising potential for clinical applications (scores comparable with or higher than that of the gastroenterology fellow). <b>Conclusions:</b> LLMs could be utilized to assist with specialized tasks such as patient counseling. However, RAG capabilities by enabling real-time retrieval of external data not included in the training dataset, appear essential for managing complex, specialized content, and clinician oversight will remain crucial to ensure safe and effective use in clinical practice.https://www.mdpi.com/2306-5354/12/1/1large language modelartificial intelligencegastroenterology |
spellingShingle | Eun Jeong Gong Chang Seok Bang Jae Jun Lee Jonghyung Park Eunsil Kim Subeen Kim Minjae Kimm Seoung-Ho Choi The Potential Clinical Utility of the Customized Large Language Model in Gastroenterology: A Pilot Study Bioengineering large language model artificial intelligence gastroenterology |
title | The Potential Clinical Utility of the Customized Large Language Model in Gastroenterology: A Pilot Study |
title_full | The Potential Clinical Utility of the Customized Large Language Model in Gastroenterology: A Pilot Study |
title_fullStr | The Potential Clinical Utility of the Customized Large Language Model in Gastroenterology: A Pilot Study |
title_full_unstemmed | The Potential Clinical Utility of the Customized Large Language Model in Gastroenterology: A Pilot Study |
title_short | The Potential Clinical Utility of the Customized Large Language Model in Gastroenterology: A Pilot Study |
title_sort | potential clinical utility of the customized large language model in gastroenterology a pilot study |
topic | large language model artificial intelligence gastroenterology |
url | https://www.mdpi.com/2306-5354/12/1/1 |
work_keys_str_mv | AT eunjeonggong thepotentialclinicalutilityofthecustomizedlargelanguagemodelingastroenterologyapilotstudy AT changseokbang thepotentialclinicalutilityofthecustomizedlargelanguagemodelingastroenterologyapilotstudy AT jaejunlee thepotentialclinicalutilityofthecustomizedlargelanguagemodelingastroenterologyapilotstudy AT jonghyungpark thepotentialclinicalutilityofthecustomizedlargelanguagemodelingastroenterologyapilotstudy AT eunsilkim thepotentialclinicalutilityofthecustomizedlargelanguagemodelingastroenterologyapilotstudy AT subeenkim thepotentialclinicalutilityofthecustomizedlargelanguagemodelingastroenterologyapilotstudy AT minjaekimm thepotentialclinicalutilityofthecustomizedlargelanguagemodelingastroenterologyapilotstudy AT seounghochoi thepotentialclinicalutilityofthecustomizedlargelanguagemodelingastroenterologyapilotstudy AT eunjeonggong potentialclinicalutilityofthecustomizedlargelanguagemodelingastroenterologyapilotstudy AT changseokbang potentialclinicalutilityofthecustomizedlargelanguagemodelingastroenterologyapilotstudy AT jaejunlee potentialclinicalutilityofthecustomizedlargelanguagemodelingastroenterologyapilotstudy AT jonghyungpark potentialclinicalutilityofthecustomizedlargelanguagemodelingastroenterologyapilotstudy AT eunsilkim potentialclinicalutilityofthecustomizedlargelanguagemodelingastroenterologyapilotstudy AT subeenkim potentialclinicalutilityofthecustomizedlargelanguagemodelingastroenterologyapilotstudy AT minjaekimm potentialclinicalutilityofthecustomizedlargelanguagemodelingastroenterologyapilotstudy AT seounghochoi potentialclinicalutilityofthecustomizedlargelanguagemodelingastroenterologyapilotstudy |