Using Large Language Models to Detect and Understand Drug Discontinuation Events in Web-Based Forums: Development and Validation Study
BackgroundThe implementation of large language models (LLMs), such as BART (Bidirectional and Auto-Regressive Transformers) and GPT-4, has revolutionized the extraction of insights from unstructured text. These advancements have expanded into health care, allowing analysis of...
Saved in:
Main Authors: | , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
JMIR Publications
2025-01-01
|
Series: | Journal of Medical Internet Research |
Online Access: | https://www.jmir.org/2025/1/e54601 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832577752898732032 |
---|---|
author | William Trevena Xiang Zhong Michelle Alvarado Alexander Semenov Alp Oktay Devin Devlin Aarya Yogesh Gohil Sai Harsha Chittimouju |
author_facet | William Trevena Xiang Zhong Michelle Alvarado Alexander Semenov Alp Oktay Devin Devlin Aarya Yogesh Gohil Sai Harsha Chittimouju |
author_sort | William Trevena |
collection | DOAJ |
description |
BackgroundThe implementation of large language models (LLMs), such as BART (Bidirectional and Auto-Regressive Transformers) and GPT-4, has revolutionized the extraction of insights from unstructured text. These advancements have expanded into health care, allowing analysis of social media for public health insights. However, the detection of drug discontinuation events (DDEs) remains underexplored. Identifying DDEs is crucial for understanding medication adherence and patient outcomes.
ObjectiveThe aim of this study is to provide a flexible framework for investigating various clinical research questions in data-sparse environments. We provide an example of the utility of this framework by identifying DDEs and their root causes in an open-source web-based forum, MedHelp, and by releasing the first open-source DDE datasets to aid further research in this domain.
MethodsWe used several LLMs, including GPT-4 Turbo, GPT-4o, DeBERTa (Decoding-Enhanced Bidirectional Encoder Representations from Transformer with Disentangled Attention), and BART, among others, to detect and determine the root causes of DDEs in user comments posted on MedHelp. Our study design included the use of zero-shot classification, which allows these models to make predictions without task-specific training. We split user comments into sentences and applied different classification strategies to assess the performance of these models in identifying DDEs and their root causes.
ResultsAmong the selected models, GPT-4o performed the best at determining the root causes of DDEs, predicting only 12.9% of root causes incorrectly (hamming loss). Among the open-source models tested, BART demonstrated the best performance in detecting DDEs, achieving an F1-score of 0.86, a false positive rate of 2.8%, and a false negative rate of 6.5%, all without any fine-tuning. The dataset included 10.7% (107/1000) DDEs, emphasizing the models’ robustness in an imbalanced data context.
ConclusionsThis study demonstrated the effectiveness of open- and closed-source LLMs, such as GPT-4o and BART, for detecting DDEs and their root causes from publicly accessible data through zero-shot classification. The robust and scalable framework we propose can aid researchers in addressing data-sparse clinical research questions. The launch of open-access DDE datasets has the potential to stimulate further research and novel discoveries in this field. |
format | Article |
id | doaj-art-86b31b83fe5040e1b9349428177d03a6 |
institution | Kabale University |
issn | 1438-8871 |
language | English |
publishDate | 2025-01-01 |
publisher | JMIR Publications |
record_format | Article |
series | Journal of Medical Internet Research |
spelling | doaj-art-86b31b83fe5040e1b9349428177d03a62025-01-30T15:45:33ZengJMIR PublicationsJournal of Medical Internet Research1438-88712025-01-0127e5460110.2196/54601Using Large Language Models to Detect and Understand Drug Discontinuation Events in Web-Based Forums: Development and Validation StudyWilliam Trevenahttps://orcid.org/0000-0001-7011-6867Xiang Zhonghttps://orcid.org/0000-0002-6214-5876Michelle Alvaradohttps://orcid.org/0000-0001-9649-214XAlexander Semenovhttps://orcid.org/0000-0003-2691-4575Alp Oktayhttps://orcid.org/0009-0007-2075-4896Devin Devlinhttps://orcid.org/0009-0005-2207-673XAarya Yogesh Gohilhttps://orcid.org/0009-0008-3603-9480Sai Harsha Chittimoujuhttps://orcid.org/0009-0002-4420-9220 BackgroundThe implementation of large language models (LLMs), such as BART (Bidirectional and Auto-Regressive Transformers) and GPT-4, has revolutionized the extraction of insights from unstructured text. These advancements have expanded into health care, allowing analysis of social media for public health insights. However, the detection of drug discontinuation events (DDEs) remains underexplored. Identifying DDEs is crucial for understanding medication adherence and patient outcomes. ObjectiveThe aim of this study is to provide a flexible framework for investigating various clinical research questions in data-sparse environments. We provide an example of the utility of this framework by identifying DDEs and their root causes in an open-source web-based forum, MedHelp, and by releasing the first open-source DDE datasets to aid further research in this domain. MethodsWe used several LLMs, including GPT-4 Turbo, GPT-4o, DeBERTa (Decoding-Enhanced Bidirectional Encoder Representations from Transformer with Disentangled Attention), and BART, among others, to detect and determine the root causes of DDEs in user comments posted on MedHelp. Our study design included the use of zero-shot classification, which allows these models to make predictions without task-specific training. We split user comments into sentences and applied different classification strategies to assess the performance of these models in identifying DDEs and their root causes. ResultsAmong the selected models, GPT-4o performed the best at determining the root causes of DDEs, predicting only 12.9% of root causes incorrectly (hamming loss). Among the open-source models tested, BART demonstrated the best performance in detecting DDEs, achieving an F1-score of 0.86, a false positive rate of 2.8%, and a false negative rate of 6.5%, all without any fine-tuning. The dataset included 10.7% (107/1000) DDEs, emphasizing the models’ robustness in an imbalanced data context. ConclusionsThis study demonstrated the effectiveness of open- and closed-source LLMs, such as GPT-4o and BART, for detecting DDEs and their root causes from publicly accessible data through zero-shot classification. The robust and scalable framework we propose can aid researchers in addressing data-sparse clinical research questions. The launch of open-access DDE datasets has the potential to stimulate further research and novel discoveries in this field.https://www.jmir.org/2025/1/e54601 |
spellingShingle | William Trevena Xiang Zhong Michelle Alvarado Alexander Semenov Alp Oktay Devin Devlin Aarya Yogesh Gohil Sai Harsha Chittimouju Using Large Language Models to Detect and Understand Drug Discontinuation Events in Web-Based Forums: Development and Validation Study Journal of Medical Internet Research |
title | Using Large Language Models to Detect and Understand Drug Discontinuation Events in Web-Based Forums: Development and Validation Study |
title_full | Using Large Language Models to Detect and Understand Drug Discontinuation Events in Web-Based Forums: Development and Validation Study |
title_fullStr | Using Large Language Models to Detect and Understand Drug Discontinuation Events in Web-Based Forums: Development and Validation Study |
title_full_unstemmed | Using Large Language Models to Detect and Understand Drug Discontinuation Events in Web-Based Forums: Development and Validation Study |
title_short | Using Large Language Models to Detect and Understand Drug Discontinuation Events in Web-Based Forums: Development and Validation Study |
title_sort | using large language models to detect and understand drug discontinuation events in web based forums development and validation study |
url | https://www.jmir.org/2025/1/e54601 |
work_keys_str_mv | AT williamtrevena usinglargelanguagemodelstodetectandunderstanddrugdiscontinuationeventsinwebbasedforumsdevelopmentandvalidationstudy AT xiangzhong usinglargelanguagemodelstodetectandunderstanddrugdiscontinuationeventsinwebbasedforumsdevelopmentandvalidationstudy AT michellealvarado usinglargelanguagemodelstodetectandunderstanddrugdiscontinuationeventsinwebbasedforumsdevelopmentandvalidationstudy AT alexandersemenov usinglargelanguagemodelstodetectandunderstanddrugdiscontinuationeventsinwebbasedforumsdevelopmentandvalidationstudy AT alpoktay usinglargelanguagemodelstodetectandunderstanddrugdiscontinuationeventsinwebbasedforumsdevelopmentandvalidationstudy AT devindevlin usinglargelanguagemodelstodetectandunderstanddrugdiscontinuationeventsinwebbasedforumsdevelopmentandvalidationstudy AT aaryayogeshgohil usinglargelanguagemodelstodetectandunderstanddrugdiscontinuationeventsinwebbasedforumsdevelopmentandvalidationstudy AT saiharshachittimouju usinglargelanguagemodelstodetectandunderstanddrugdiscontinuationeventsinwebbasedforumsdevelopmentandvalidationstudy |