Classification Performance Comparison of BERT and IndoBERT on SelfReport of COVID-19 Status on Social Media

Messages shared on social media platforms like X are automatically categorized into two groups: those who self-report COVID-19 status and those who do not. However, it is essential to note that these messages cannot be a reliable monitoring tool for tracking the spread of the COVID-19 pandemic. The...

Full description

Saved in:

Bibliographic Details
Main Authors:	Irwan Budiman, Mohammad Reza Faisal, Astina Faridhah, Andi Farmadi, Muhammad Itqan Mazdadi, Triando Hamonangan Saragih, Friska Abadi
Format:	Article
Language:	English
Published:	Lublin University of Technology 2024-03-01
Series:	Journal of Computer Sciences Institute
Subjects:	Text Classification Covid-19 symptoms Twitter BERT IndoBERT
Online Access:	https://ph.pollub.pl/index.php/jcsi/article/view/5564
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832570013165289472
author	Irwan Budiman Mohammad Reza Faisal Astina Faridhah Andi Farmadi Muhammad Itqan Mazdadi Triando Hamonangan Saragih Friska Abadi
author_facet	Irwan Budiman Mohammad Reza Faisal Astina Faridhah Andi Farmadi Muhammad Itqan Mazdadi Triando Hamonangan Saragih Friska Abadi
author_sort	Irwan Budiman
collection	DOAJ
description	Messages shared on social media platforms like X are automatically categorized into two groups: those who self-report COVID-19 status and those who do not. However, it is essential to note that these messages cannot be a reliable monitoring tool for tracking the spread of the COVID-19 pandemic. The classification of social media messages can be achieved through the application of classification algorithms. Many deep learning-based algorithms, such as Convolutional Neural Networks (CNN) or Long Short-Term Memory (LSTM), have been used for text classification. However, CNN has limitations in understanding global context, while LSTM focuses more on understanding word-by-word sequences. Apart from that, both require a lot of data to learn. Currently, an algorithm is being developed for text classification that can cover the shortcomings of the previous algorithm, namely Bidirectional Encoder Representations from Transformers (BERT). Currently, there are many variants of BERT development. The primary objective of this study was to compare the effectiveness of two classification models, namely BERT and IndoBERT, in identifying self-report messages of COVID-19 status. Both BERT and IndoBERT models were evaluated using raw and preprocessed text data from X. The study's findings revealed that the IndoBERT model exhibited superior performance, achieving an accuracy rate of 94%, whereas the BERT model achieved a performance rate of 82%.
format	Article
id	doaj-art-8955cd688f214fc9a15668f4cdf36158
institution	Kabale University
issn	2544-0764
language	English
publishDate	2024-03-01
publisher	Lublin University of Technology
record_format	Article
series	Journal of Computer Sciences Institute
spelling	doaj-art-8955cd688f214fc9a15668f4cdf361582025-02-02T18:01:31ZengLublin University of TechnologyJournal of Computer Sciences Institute2544-07642024-03-013010.35784/jcsi.5564Classification Performance Comparison of BERT and IndoBERT on SelfReport of COVID-19 Status on Social MediaIrwan Budiman0Mohammad Reza Faisal1Astina Faridhah2Andi Farmadi3Muhammad Itqan Mazdadi4Triando Hamonangan Saragih5Friska Abadi6Lambung Mangkurat UniversityLambung Mangkurat UniversityLambung Mangkurat UniversityLambung Mangkurat UniversityLambung Mangkurat UniversityLambung Mangkurat UniversityLambung Mangkurat University Messages shared on social media platforms like X are automatically categorized into two groups: those who self-report COVID-19 status and those who do not. However, it is essential to note that these messages cannot be a reliable monitoring tool for tracking the spread of the COVID-19 pandemic. The classification of social media messages can be achieved through the application of classification algorithms. Many deep learning-based algorithms, such as Convolutional Neural Networks (CNN) or Long Short-Term Memory (LSTM), have been used for text classification. However, CNN has limitations in understanding global context, while LSTM focuses more on understanding word-by-word sequences. Apart from that, both require a lot of data to learn. Currently, an algorithm is being developed for text classification that can cover the shortcomings of the previous algorithm, namely Bidirectional Encoder Representations from Transformers (BERT). Currently, there are many variants of BERT development. The primary objective of this study was to compare the effectiveness of two classification models, namely BERT and IndoBERT, in identifying self-report messages of COVID-19 status. Both BERT and IndoBERT models were evaluated using raw and preprocessed text data from X. The study's findings revealed that the IndoBERT model exhibited superior performance, achieving an accuracy rate of 94%, whereas the BERT model achieved a performance rate of 82%. https://ph.pollub.pl/index.php/jcsi/article/view/5564Text ClassificationCovid-19 symptomsTwitterBERTIndoBERT
spellingShingle	Irwan Budiman Mohammad Reza Faisal Astina Faridhah Andi Farmadi Muhammad Itqan Mazdadi Triando Hamonangan Saragih Friska Abadi Classification Performance Comparison of BERT and IndoBERT on SelfReport of COVID-19 Status on Social Media Journal of Computer Sciences Institute Text Classification Covid-19 symptoms Twitter BERT IndoBERT
title	Classification Performance Comparison of BERT and IndoBERT on SelfReport of COVID-19 Status on Social Media
title_full	Classification Performance Comparison of BERT and IndoBERT on SelfReport of COVID-19 Status on Social Media
title_fullStr	Classification Performance Comparison of BERT and IndoBERT on SelfReport of COVID-19 Status on Social Media
title_full_unstemmed	Classification Performance Comparison of BERT and IndoBERT on SelfReport of COVID-19 Status on Social Media
title_short	Classification Performance Comparison of BERT and IndoBERT on SelfReport of COVID-19 Status on Social Media
title_sort	classification performance comparison of bert and indobert on selfreport of covid 19 status on social media
topic	Text Classification Covid-19 symptoms Twitter BERT IndoBERT
url	https://ph.pollub.pl/index.php/jcsi/article/view/5564
work_keys_str_mv	AT irwanbudiman classificationperformancecomparisonofbertandindobertonselfreportofcovid19statusonsocialmedia AT mohammadrezafaisal classificationperformancecomparisonofbertandindobertonselfreportofcovid19statusonsocialmedia AT astinafaridhah classificationperformancecomparisonofbertandindobertonselfreportofcovid19statusonsocialmedia AT andifarmadi classificationperformancecomparisonofbertandindobertonselfreportofcovid19statusonsocialmedia AT muhammaditqanmazdadi classificationperformancecomparisonofbertandindobertonselfreportofcovid19statusonsocialmedia AT triandohamonangansaragih classificationperformancecomparisonofbertandindobertonselfreportofcovid19statusonsocialmedia AT friskaabadi classificationperformancecomparisonofbertandindobertonselfreportofcovid19statusonsocialmedia

Classification Performance Comparison of BERT and IndoBERT on SelfReport of COVID-19 Status on Social Media

Similar Items