Multifaceted Natural Language Processing Task–Based Evaluation of Bidirectional Encoder Representations From Transformers Models for Bilingual (Korean and English) Clinical Notes: Algorithm Development and Validation
Abstract BackgroundThe bidirectional encoder representations from transformers (BERT) model has attracted considerable attention in clinical applications, such as patient classification and disease prediction. However, current studies have typically progressed to application d...
Saved in:
| Main Authors: | , , , , , , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
JMIR Publications
2024-10-01
|
| Series: | JMIR Medical Informatics |
| Online Access: | https://medinform.jmir.org/2024/1/e52897 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850199971065757696 |
|---|---|
| author | Kyungmo Kim Seongkeun Park Jeongwon Min Sumin Park Ju Yeon Kim Jinsu Eun Kyuha Jung Yoobin Elyson Park Esther Kim Eun Young Lee Joonhwan Lee Jinwook Choi |
| author_facet | Kyungmo Kim Seongkeun Park Jeongwon Min Sumin Park Ju Yeon Kim Jinsu Eun Kyuha Jung Yoobin Elyson Park Esther Kim Eun Young Lee Joonhwan Lee Jinwook Choi |
| author_sort | Kyungmo Kim |
| collection | DOAJ |
| description |
Abstract
BackgroundThe bidirectional encoder representations from transformers (BERT) model has attracted considerable attention in clinical applications, such as patient classification and disease prediction. However, current studies have typically progressed to application development without a thorough assessment of the model’s comprehension of clinical context. Furthermore, limited comparative studies have been conducted on BERT models using medical documents from non–English-speaking countries. Therefore, the applicability of BERT models trained on English clinical notes to non-English contexts is yet to be confirmed. To address these gaps in literature, this study focused on identifying the most effective BERT model for non-English clinical notes.
ObjectiveIn this study, we evaluated the contextual understanding abilities of various BERT models applied to mixed Korean and English clinical notes. The objective of this study was to identify the BERT model that excels in understanding the context of such documents.
MethodsUsing data from 164,460 patients in a South Korean tertiary hospital, we pretrained BERT-base, BERT for Biomedical Text Mining (BioBERT), Korean BERT (KoBERT), and Multilingual BERT (M-BERT) to improve their contextual comprehension capabilities and subsequently compared their performances in 7 fine-tuning tasks.
ResultsThe model performance varied based on the task and token usage. First, BERT-base and BioBERT excelled in tasks using classification ([CLS]) token embeddings, such as document classification. BioBERT achieved the highest F1F1
ConclusionsThis study highlighted the effectiveness of various BERT models in a multilingual clinical domain. The findings can be used as a reference in clinical and language-based applications. |
| format | Article |
| id | doaj-art-54ac1dc40a464c26a2554efa5c933723 |
| institution | OA Journals |
| issn | 2291-9694 |
| language | English |
| publishDate | 2024-10-01 |
| publisher | JMIR Publications |
| record_format | Article |
| series | JMIR Medical Informatics |
| spelling | doaj-art-54ac1dc40a464c26a2554efa5c9337232025-08-20T02:12:29ZengJMIR PublicationsJMIR Medical Informatics2291-96942024-10-0112e52897e5289710.2196/52897Multifaceted Natural Language Processing Task–Based Evaluation of Bidirectional Encoder Representations From Transformers Models for Bilingual (Korean and English) Clinical Notes: Algorithm Development and ValidationKyungmo Kimhttp://orcid.org/0000-0002-8974-5302Seongkeun Parkhttp://orcid.org/0000-0002-4868-9404Jeongwon Minhttp://orcid.org/0000-0001-8412-5545Sumin Parkhttp://orcid.org/0000-0002-9917-2579Ju Yeon Kimhttp://orcid.org/0000-0001-8982-6869Jinsu Eunhttp://orcid.org/0000-0003-3051-7193Kyuha Junghttp://orcid.org/0000-0002-5442-391XYoobin Elyson Parkhttp://orcid.org/0000-0002-3844-1333Esther Kimhttp://orcid.org/0000-0002-9576-4411Eun Young Leehttp://orcid.org/0000-0001-6975-8627Joonhwan Leehttp://orcid.org/0000-0002-3115-4024Jinwook Choihttp://orcid.org/0000-0002-9424-9944 Abstract BackgroundThe bidirectional encoder representations from transformers (BERT) model has attracted considerable attention in clinical applications, such as patient classification and disease prediction. However, current studies have typically progressed to application development without a thorough assessment of the model’s comprehension of clinical context. Furthermore, limited comparative studies have been conducted on BERT models using medical documents from non–English-speaking countries. Therefore, the applicability of BERT models trained on English clinical notes to non-English contexts is yet to be confirmed. To address these gaps in literature, this study focused on identifying the most effective BERT model for non-English clinical notes. ObjectiveIn this study, we evaluated the contextual understanding abilities of various BERT models applied to mixed Korean and English clinical notes. The objective of this study was to identify the BERT model that excels in understanding the context of such documents. MethodsUsing data from 164,460 patients in a South Korean tertiary hospital, we pretrained BERT-base, BERT for Biomedical Text Mining (BioBERT), Korean BERT (KoBERT), and Multilingual BERT (M-BERT) to improve their contextual comprehension capabilities and subsequently compared their performances in 7 fine-tuning tasks. ResultsThe model performance varied based on the task and token usage. First, BERT-base and BioBERT excelled in tasks using classification ([CLS]) token embeddings, such as document classification. BioBERT achieved the highest F1F1 ConclusionsThis study highlighted the effectiveness of various BERT models in a multilingual clinical domain. The findings can be used as a reference in clinical and language-based applications.https://medinform.jmir.org/2024/1/e52897 |
| spellingShingle | Kyungmo Kim Seongkeun Park Jeongwon Min Sumin Park Ju Yeon Kim Jinsu Eun Kyuha Jung Yoobin Elyson Park Esther Kim Eun Young Lee Joonhwan Lee Jinwook Choi Multifaceted Natural Language Processing Task–Based Evaluation of Bidirectional Encoder Representations From Transformers Models for Bilingual (Korean and English) Clinical Notes: Algorithm Development and Validation JMIR Medical Informatics |
| title | Multifaceted Natural Language Processing Task–Based Evaluation of Bidirectional Encoder Representations From Transformers Models for Bilingual (Korean and English) Clinical Notes: Algorithm Development and Validation |
| title_full | Multifaceted Natural Language Processing Task–Based Evaluation of Bidirectional Encoder Representations From Transformers Models for Bilingual (Korean and English) Clinical Notes: Algorithm Development and Validation |
| title_fullStr | Multifaceted Natural Language Processing Task–Based Evaluation of Bidirectional Encoder Representations From Transformers Models for Bilingual (Korean and English) Clinical Notes: Algorithm Development and Validation |
| title_full_unstemmed | Multifaceted Natural Language Processing Task–Based Evaluation of Bidirectional Encoder Representations From Transformers Models for Bilingual (Korean and English) Clinical Notes: Algorithm Development and Validation |
| title_short | Multifaceted Natural Language Processing Task–Based Evaluation of Bidirectional Encoder Representations From Transformers Models for Bilingual (Korean and English) Clinical Notes: Algorithm Development and Validation |
| title_sort | multifaceted natural language processing task based evaluation of bidirectional encoder representations from transformers models for bilingual korean and english clinical notes algorithm development and validation |
| url | https://medinform.jmir.org/2024/1/e52897 |
| work_keys_str_mv | AT kyungmokim multifacetednaturallanguageprocessingtaskbasedevaluationofbidirectionalencoderrepresentationsfromtransformersmodelsforbilingualkoreanandenglishclinicalnotesalgorithmdevelopmentandvalidation AT seongkeunpark multifacetednaturallanguageprocessingtaskbasedevaluationofbidirectionalencoderrepresentationsfromtransformersmodelsforbilingualkoreanandenglishclinicalnotesalgorithmdevelopmentandvalidation AT jeongwonmin multifacetednaturallanguageprocessingtaskbasedevaluationofbidirectionalencoderrepresentationsfromtransformersmodelsforbilingualkoreanandenglishclinicalnotesalgorithmdevelopmentandvalidation AT suminpark multifacetednaturallanguageprocessingtaskbasedevaluationofbidirectionalencoderrepresentationsfromtransformersmodelsforbilingualkoreanandenglishclinicalnotesalgorithmdevelopmentandvalidation AT juyeonkim multifacetednaturallanguageprocessingtaskbasedevaluationofbidirectionalencoderrepresentationsfromtransformersmodelsforbilingualkoreanandenglishclinicalnotesalgorithmdevelopmentandvalidation AT jinsueun multifacetednaturallanguageprocessingtaskbasedevaluationofbidirectionalencoderrepresentationsfromtransformersmodelsforbilingualkoreanandenglishclinicalnotesalgorithmdevelopmentandvalidation AT kyuhajung multifacetednaturallanguageprocessingtaskbasedevaluationofbidirectionalencoderrepresentationsfromtransformersmodelsforbilingualkoreanandenglishclinicalnotesalgorithmdevelopmentandvalidation AT yoobinelysonpark multifacetednaturallanguageprocessingtaskbasedevaluationofbidirectionalencoderrepresentationsfromtransformersmodelsforbilingualkoreanandenglishclinicalnotesalgorithmdevelopmentandvalidation AT estherkim multifacetednaturallanguageprocessingtaskbasedevaluationofbidirectionalencoderrepresentationsfromtransformersmodelsforbilingualkoreanandenglishclinicalnotesalgorithmdevelopmentandvalidation AT eunyounglee multifacetednaturallanguageprocessingtaskbasedevaluationofbidirectionalencoderrepresentationsfromtransformersmodelsforbilingualkoreanandenglishclinicalnotesalgorithmdevelopmentandvalidation AT joonhwanlee multifacetednaturallanguageprocessingtaskbasedevaluationofbidirectionalencoderrepresentationsfromtransformersmodelsforbilingualkoreanandenglishclinicalnotesalgorithmdevelopmentandvalidation AT jinwookchoi multifacetednaturallanguageprocessingtaskbasedevaluationofbidirectionalencoderrepresentationsfromtransformersmodelsforbilingualkoreanandenglishclinicalnotesalgorithmdevelopmentandvalidation |