Classification Performance Comparison of BERT and IndoBERT on SelfReport of COVID-19 Status on Social Media

Messages shared on social media platforms like X are automatically categorized into two groups: those who self-report COVID-19 status and those who do not. However, it is essential to note that these messages cannot be a reliable monitoring tool for tracking the spread of the COVID-19 pandemic. The...

Full description

Saved in:
Bibliographic Details
Main Authors: Irwan Budiman, Mohammad Reza Faisal, Astina Faridhah, Andi Farmadi, Muhammad Itqan Mazdadi, Triando Hamonangan Saragih, Friska Abadi
Format: Article
Language:English
Published: Lublin University of Technology 2024-03-01
Series:Journal of Computer Sciences Institute
Subjects:
Online Access:https://ph.pollub.pl/index.php/jcsi/article/view/5564
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832570013165289472
author Irwan Budiman
Mohammad Reza Faisal
Astina Faridhah
Andi Farmadi
Muhammad Itqan Mazdadi
Triando Hamonangan Saragih
Friska Abadi
author_facet Irwan Budiman
Mohammad Reza Faisal
Astina Faridhah
Andi Farmadi
Muhammad Itqan Mazdadi
Triando Hamonangan Saragih
Friska Abadi
author_sort Irwan Budiman
collection DOAJ
description Messages shared on social media platforms like X are automatically categorized into two groups: those who self-report COVID-19 status and those who do not. However, it is essential to note that these messages cannot be a reliable monitoring tool for tracking the spread of the COVID-19 pandemic. The classification of social media messages can be achieved through the application of classification algorithms. Many deep learning-based algorithms, such as Convolutional Neural Networks (CNN) or Long Short-Term Memory (LSTM), have been used for text classification. However, CNN has limitations in understanding global context, while LSTM focuses more on understanding word-by-word sequences. Apart from that, both require a lot of data to learn. Currently, an algorithm is being developed for text classification that can cover the shortcomings of the previous algorithm, namely Bidirectional Encoder Representations from Transformers (BERT). Currently, there are many variants of BERT development. The primary objective of this study was to compare the effectiveness of two classification models, namely BERT and IndoBERT, in identifying self-report messages of COVID-19 status. Both BERT and IndoBERT models were evaluated using raw and preprocessed text data from X. The study's findings revealed that the IndoBERT model exhibited superior performance, achieving an accuracy rate of 94%, whereas the BERT model achieved a performance rate of 82%.
format Article
id doaj-art-8955cd688f214fc9a15668f4cdf36158
institution Kabale University
issn 2544-0764
language English
publishDate 2024-03-01
publisher Lublin University of Technology
record_format Article
series Journal of Computer Sciences Institute
spelling doaj-art-8955cd688f214fc9a15668f4cdf361582025-02-02T18:01:31ZengLublin University of TechnologyJournal of Computer Sciences Institute2544-07642024-03-013010.35784/jcsi.5564Classification Performance Comparison of BERT and IndoBERT on SelfReport of COVID-19 Status on Social MediaIrwan Budiman0Mohammad Reza Faisal1Astina Faridhah2Andi Farmadi3Muhammad Itqan Mazdadi4Triando Hamonangan Saragih5Friska Abadi6Lambung Mangkurat UniversityLambung Mangkurat UniversityLambung Mangkurat UniversityLambung Mangkurat UniversityLambung Mangkurat UniversityLambung Mangkurat UniversityLambung Mangkurat University Messages shared on social media platforms like X are automatically categorized into two groups: those who self-report COVID-19 status and those who do not. However, it is essential to note that these messages cannot be a reliable monitoring tool for tracking the spread of the COVID-19 pandemic. The classification of social media messages can be achieved through the application of classification algorithms. Many deep learning-based algorithms, such as Convolutional Neural Networks (CNN) or Long Short-Term Memory (LSTM), have been used for text classification. However, CNN has limitations in understanding global context, while LSTM focuses more on understanding word-by-word sequences. Apart from that, both require a lot of data to learn. Currently, an algorithm is being developed for text classification that can cover the shortcomings of the previous algorithm, namely Bidirectional Encoder Representations from Transformers (BERT). Currently, there are many variants of BERT development. The primary objective of this study was to compare the effectiveness of two classification models, namely BERT and IndoBERT, in identifying self-report messages of COVID-19 status. Both BERT and IndoBERT models were evaluated using raw and preprocessed text data from X. The study's findings revealed that the IndoBERT model exhibited superior performance, achieving an accuracy rate of 94%, whereas the BERT model achieved a performance rate of 82%. https://ph.pollub.pl/index.php/jcsi/article/view/5564Text ClassificationCovid-19 symptomsTwitterBERTIndoBERT
spellingShingle Irwan Budiman
Mohammad Reza Faisal
Astina Faridhah
Andi Farmadi
Muhammad Itqan Mazdadi
Triando Hamonangan Saragih
Friska Abadi
Classification Performance Comparison of BERT and IndoBERT on SelfReport of COVID-19 Status on Social Media
Journal of Computer Sciences Institute
Text Classification
Covid-19 symptoms
Twitter
BERT
IndoBERT
title Classification Performance Comparison of BERT and IndoBERT on SelfReport of COVID-19 Status on Social Media
title_full Classification Performance Comparison of BERT and IndoBERT on SelfReport of COVID-19 Status on Social Media
title_fullStr Classification Performance Comparison of BERT and IndoBERT on SelfReport of COVID-19 Status on Social Media
title_full_unstemmed Classification Performance Comparison of BERT and IndoBERT on SelfReport of COVID-19 Status on Social Media
title_short Classification Performance Comparison of BERT and IndoBERT on SelfReport of COVID-19 Status on Social Media
title_sort classification performance comparison of bert and indobert on selfreport of covid 19 status on social media
topic Text Classification
Covid-19 symptoms
Twitter
BERT
IndoBERT
url https://ph.pollub.pl/index.php/jcsi/article/view/5564
work_keys_str_mv AT irwanbudiman classificationperformancecomparisonofbertandindobertonselfreportofcovid19statusonsocialmedia
AT mohammadrezafaisal classificationperformancecomparisonofbertandindobertonselfreportofcovid19statusonsocialmedia
AT astinafaridhah classificationperformancecomparisonofbertandindobertonselfreportofcovid19statusonsocialmedia
AT andifarmadi classificationperformancecomparisonofbertandindobertonselfreportofcovid19statusonsocialmedia
AT muhammaditqanmazdadi classificationperformancecomparisonofbertandindobertonselfreportofcovid19statusonsocialmedia
AT triandohamonangansaragih classificationperformancecomparisonofbertandindobertonselfreportofcovid19statusonsocialmedia
AT friskaabadi classificationperformancecomparisonofbertandindobertonselfreportofcovid19statusonsocialmedia