Machine Learning Classifiers and Data Synthesis Techniques to Tackle with Highly Imbalanced COVID-19 Data
The COVID-19 pandemic has highlighted the urgent need for rapid and accurate diagnostic methods. In this study, we evaluate three machine learning models—Random Forest (RF), Logistic Regression (LR) and Decision Tree (DT)—for detecting COVID-19 trained on preprocessed imbalanced datasets with 5086 n...
Saved in:
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Ferdowsi University of Mashhad
2024-12-01
|
Series: | Computer and Knowledge Engineering |
Subjects: | |
Online Access: | https://cke.um.ac.ir/article_45898_b3c8e1d9ecf92ea8a3734a1aab782226.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | The COVID-19 pandemic has highlighted the urgent need for rapid and accurate diagnostic methods. In this study, we evaluate three machine learning models—Random Forest (RF), Logistic Regression (LR) and Decision Tree (DT)—for detecting COVID-19 trained on preprocessed imbalanced datasets with 5086 negative and 558 positive cases. To this end, we demonstrate the capability of two advanced data synthesis algorithms, Conditional Tabular Generative Adversarial Network (CTGAN) and Tabular Variational Autoencoder (TVAE), in addressing the class imbalance inherent in the dataset. The classifiers trained on the original as well as the balanced datasets were evaluated for comparison. Our findings reveal that RF obtains the highest accuracy of 98.83% on the CTGAN-balanced dataset. In conclusion, our results verify the potential of coupling data synthesis with traditional machine learning for the diagnosis of COVID-19. We hope that this research will make a significant contribution to the current AI (Artificial Intelligence) efforts in combating the pandemic. |
---|---|
ISSN: | 2538-5453 2717-4123 |