Classification Performance Comparison of BERT and IndoBERT on SelfReport of COVID-19 Status on Social Media
Messages shared on social media platforms like X are automatically categorized into two groups: those who self-report COVID-19 status and those who do not. However, it is essential to note that these messages cannot be a reliable monitoring tool for tracking the spread of the COVID-19 pandemic. The...
Saved in:
Main Authors: | , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Lublin University of Technology
2024-03-01
|
Series: | Journal of Computer Sciences Institute |
Subjects: | |
Online Access: | https://ph.pollub.pl/index.php/jcsi/article/view/5564 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832570013165289472 |
---|---|
author | Irwan Budiman Mohammad Reza Faisal Astina Faridhah Andi Farmadi Muhammad Itqan Mazdadi Triando Hamonangan Saragih Friska Abadi |
author_facet | Irwan Budiman Mohammad Reza Faisal Astina Faridhah Andi Farmadi Muhammad Itqan Mazdadi Triando Hamonangan Saragih Friska Abadi |
author_sort | Irwan Budiman |
collection | DOAJ |
description |
Messages shared on social media platforms like X are automatically categorized into two groups: those who self-report COVID-19 status and those who do not. However, it is essential to note that these messages cannot be a reliable monitoring tool for tracking the spread of the COVID-19 pandemic. The classification of social media messages can be achieved through the application of classification algorithms. Many deep learning-based algorithms, such as Convolutional Neural Networks (CNN) or Long Short-Term Memory (LSTM), have been used for text classification. However, CNN has limitations in understanding global context, while LSTM focuses more on understanding word-by-word sequences. Apart from that, both require a lot of data to learn. Currently, an algorithm is being developed for text classification that can cover the shortcomings of the previous algorithm, namely Bidirectional Encoder Representations from Transformers (BERT). Currently, there are many variants of BERT development. The primary objective of this study was to compare the effectiveness of two classification models, namely BERT and IndoBERT, in identifying self-report messages of COVID-19 status. Both BERT and IndoBERT models were evaluated using raw and preprocessed text data from X. The study's findings revealed that the IndoBERT model exhibited superior performance, achieving an accuracy rate of 94%, whereas the BERT model achieved a performance rate of 82%.
|
format | Article |
id | doaj-art-8955cd688f214fc9a15668f4cdf36158 |
institution | Kabale University |
issn | 2544-0764 |
language | English |
publishDate | 2024-03-01 |
publisher | Lublin University of Technology |
record_format | Article |
series | Journal of Computer Sciences Institute |
spelling | doaj-art-8955cd688f214fc9a15668f4cdf361582025-02-02T18:01:31ZengLublin University of TechnologyJournal of Computer Sciences Institute2544-07642024-03-013010.35784/jcsi.5564Classification Performance Comparison of BERT and IndoBERT on SelfReport of COVID-19 Status on Social MediaIrwan Budiman0Mohammad Reza Faisal1Astina Faridhah2Andi Farmadi3Muhammad Itqan Mazdadi4Triando Hamonangan Saragih5Friska Abadi6Lambung Mangkurat UniversityLambung Mangkurat UniversityLambung Mangkurat UniversityLambung Mangkurat UniversityLambung Mangkurat UniversityLambung Mangkurat UniversityLambung Mangkurat University Messages shared on social media platforms like X are automatically categorized into two groups: those who self-report COVID-19 status and those who do not. However, it is essential to note that these messages cannot be a reliable monitoring tool for tracking the spread of the COVID-19 pandemic. The classification of social media messages can be achieved through the application of classification algorithms. Many deep learning-based algorithms, such as Convolutional Neural Networks (CNN) or Long Short-Term Memory (LSTM), have been used for text classification. However, CNN has limitations in understanding global context, while LSTM focuses more on understanding word-by-word sequences. Apart from that, both require a lot of data to learn. Currently, an algorithm is being developed for text classification that can cover the shortcomings of the previous algorithm, namely Bidirectional Encoder Representations from Transformers (BERT). Currently, there are many variants of BERT development. The primary objective of this study was to compare the effectiveness of two classification models, namely BERT and IndoBERT, in identifying self-report messages of COVID-19 status. Both BERT and IndoBERT models were evaluated using raw and preprocessed text data from X. The study's findings revealed that the IndoBERT model exhibited superior performance, achieving an accuracy rate of 94%, whereas the BERT model achieved a performance rate of 82%. https://ph.pollub.pl/index.php/jcsi/article/view/5564Text ClassificationCovid-19 symptomsTwitterBERTIndoBERT |
spellingShingle | Irwan Budiman Mohammad Reza Faisal Astina Faridhah Andi Farmadi Muhammad Itqan Mazdadi Triando Hamonangan Saragih Friska Abadi Classification Performance Comparison of BERT and IndoBERT on SelfReport of COVID-19 Status on Social Media Journal of Computer Sciences Institute Text Classification Covid-19 symptoms BERT IndoBERT |
title | Classification Performance Comparison of BERT and IndoBERT on SelfReport of COVID-19 Status on Social Media |
title_full | Classification Performance Comparison of BERT and IndoBERT on SelfReport of COVID-19 Status on Social Media |
title_fullStr | Classification Performance Comparison of BERT and IndoBERT on SelfReport of COVID-19 Status on Social Media |
title_full_unstemmed | Classification Performance Comparison of BERT and IndoBERT on SelfReport of COVID-19 Status on Social Media |
title_short | Classification Performance Comparison of BERT and IndoBERT on SelfReport of COVID-19 Status on Social Media |
title_sort | classification performance comparison of bert and indobert on selfreport of covid 19 status on social media |
topic | Text Classification Covid-19 symptoms BERT IndoBERT |
url | https://ph.pollub.pl/index.php/jcsi/article/view/5564 |
work_keys_str_mv | AT irwanbudiman classificationperformancecomparisonofbertandindobertonselfreportofcovid19statusonsocialmedia AT mohammadrezafaisal classificationperformancecomparisonofbertandindobertonselfreportofcovid19statusonsocialmedia AT astinafaridhah classificationperformancecomparisonofbertandindobertonselfreportofcovid19statusonsocialmedia AT andifarmadi classificationperformancecomparisonofbertandindobertonselfreportofcovid19statusonsocialmedia AT muhammaditqanmazdadi classificationperformancecomparisonofbertandindobertonselfreportofcovid19statusonsocialmedia AT triandohamonangansaragih classificationperformancecomparisonofbertandindobertonselfreportofcovid19statusonsocialmedia AT friskaabadi classificationperformancecomparisonofbertandindobertonselfreportofcovid19statusonsocialmedia |