Application of a data-driven XGBoost model for the prediction of COVID-19 in the USA: a time-series study

Objective The COVID-19 outbreak was first reported in Wuhan, China, and has been acknowledged as a pandemic due to its rapid spread worldwide. Predicting the trend of COVID-19 is of great significance for its prevention. A comparison between the autoregressive integrated moving average (ARIMA) model...

Full description

Saved in:
Bibliographic Details
Main Authors: Wei Wu, Zheng-gang Fang, Shu-qin Yang, Cai-xia Lv, Shu-yi An
Format: Article
Language:English
Published: BMJ Publishing Group 2022-07-01
Series:BMJ Open
Online Access:https://bmjopen.bmj.com/content/12/7/e056685.full
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832575678541725696
author Wei Wu
Zheng-gang Fang
Shu-qin Yang
Cai-xia Lv
Shu-yi An
author_facet Wei Wu
Zheng-gang Fang
Shu-qin Yang
Cai-xia Lv
Shu-yi An
author_sort Wei Wu
collection DOAJ
description Objective The COVID-19 outbreak was first reported in Wuhan, China, and has been acknowledged as a pandemic due to its rapid spread worldwide. Predicting the trend of COVID-19 is of great significance for its prevention. A comparison between the autoregressive integrated moving average (ARIMA) model and the eXtreme Gradient Boosting (XGBoost) model was conducted to determine which was more accurate for anticipating the occurrence of COVID-19 in the USA.Design Time-series study.Setting The USA was the setting for this study.Main outcome measures Three accuracy metrics, mean absolute error (MAE), root mean square error (RMSE) and mean absolute percentage error (MAPE), were applied to evaluate the performance of the two models.Results In our study, for the training set and the validation set, the MAE, RMSE and MAPE of the XGBoost model were less than those of the ARIMA model.Conclusions The XGBoost model can help improve prediction of COVID-19 cases in the USA over the ARIMA model.
format Article
id doaj-art-b4d2efa2671b4b459b6f8ed42874437d
institution Kabale University
issn 2044-6055
language English
publishDate 2022-07-01
publisher BMJ Publishing Group
record_format Article
series BMJ Open
spelling doaj-art-b4d2efa2671b4b459b6f8ed42874437d2025-01-31T18:40:10ZengBMJ Publishing GroupBMJ Open2044-60552022-07-0112710.1136/bmjopen-2021-056685Application of a data-driven XGBoost model for the prediction of COVID-19 in the USA: a time-series studyWei Wu0Zheng-gang Fang1Shu-qin Yang2Cai-xia Lv3Shu-yi An47 Department of Neurology, Qilu Hospital of Shandong University, Jinan, Shandong, China1 Department of Epidemiology, China Medical University, Shenyang, China1 Department of Epidemiology, China Medical University, Shenyang, China1 Department of Epidemiology, China Medical University, Shenyang, China2 Department of Social Medicine and Health, Liaoning Provincial Center for Disease Control and Prevention, Shenyang, ChinaObjective The COVID-19 outbreak was first reported in Wuhan, China, and has been acknowledged as a pandemic due to its rapid spread worldwide. Predicting the trend of COVID-19 is of great significance for its prevention. A comparison between the autoregressive integrated moving average (ARIMA) model and the eXtreme Gradient Boosting (XGBoost) model was conducted to determine which was more accurate for anticipating the occurrence of COVID-19 in the USA.Design Time-series study.Setting The USA was the setting for this study.Main outcome measures Three accuracy metrics, mean absolute error (MAE), root mean square error (RMSE) and mean absolute percentage error (MAPE), were applied to evaluate the performance of the two models.Results In our study, for the training set and the validation set, the MAE, RMSE and MAPE of the XGBoost model were less than those of the ARIMA model.Conclusions The XGBoost model can help improve prediction of COVID-19 cases in the USA over the ARIMA model.https://bmjopen.bmj.com/content/12/7/e056685.full
spellingShingle Wei Wu
Zheng-gang Fang
Shu-qin Yang
Cai-xia Lv
Shu-yi An
Application of a data-driven XGBoost model for the prediction of COVID-19 in the USA: a time-series study
BMJ Open
title Application of a data-driven XGBoost model for the prediction of COVID-19 in the USA: a time-series study
title_full Application of a data-driven XGBoost model for the prediction of COVID-19 in the USA: a time-series study
title_fullStr Application of a data-driven XGBoost model for the prediction of COVID-19 in the USA: a time-series study
title_full_unstemmed Application of a data-driven XGBoost model for the prediction of COVID-19 in the USA: a time-series study
title_short Application of a data-driven XGBoost model for the prediction of COVID-19 in the USA: a time-series study
title_sort application of a data driven xgboost model for the prediction of covid 19 in the usa a time series study
url https://bmjopen.bmj.com/content/12/7/e056685.full
work_keys_str_mv AT weiwu applicationofadatadrivenxgboostmodelforthepredictionofcovid19intheusaatimeseriesstudy
AT zhenggangfang applicationofadatadrivenxgboostmodelforthepredictionofcovid19intheusaatimeseriesstudy
AT shuqinyang applicationofadatadrivenxgboostmodelforthepredictionofcovid19intheusaatimeseriesstudy
AT caixialv applicationofadatadrivenxgboostmodelforthepredictionofcovid19intheusaatimeseriesstudy
AT shuyian applicationofadatadrivenxgboostmodelforthepredictionofcovid19intheusaatimeseriesstudy