Rainfall Prediction Using Integrated Machine Learning Models With K-Means Clustering: A Representative Case Study of Harirud Murghab Basin-Afghanistan

Accurate rainfall prediction was essential for effective water resource management and disaster preparedness, especially in regions with limited observational data such as Afghanistan. This study objective was to develop a reliable rainfall prediction machine learning (ML) model by integrating satel...

Full description

Saved in:
Bibliographic Details
Main Authors: Ziaul Haq Haq Doost, Ali Alsuwaiyan, Abdulazeez Abdulraheem, Nabil M. Al-Areeq, Zaher Mundher Yaseen
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/11045676/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Accurate rainfall prediction was essential for effective water resource management and disaster preparedness, especially in regions with limited observational data such as Afghanistan. This study objective was to develop a reliable rainfall prediction machine learning (ML) model by integrating satellite-derived meteorological data with ground-based observational rainfall data. Four ML models including Gradient Boosting Regressor (GBR), Hist Gradient Boosting Regressor (HGBR), Random Forest Regressor (RFR), and Xtreme Gradient Boosting Regressor (XGBR) were used. The models were evaluated at three stations (Nazdik-i Herat, Shinya, and Torghundi) using coefficient of determination (R2), mean squared error (MSE), root mean squared error (RMSE), Mean absolute error (MAE), and median absolute error (MedAE) as evaluation metrics. Results showed at Nazdik-i Herat station, HGBR model achieved R2 of 0.90 for training phase and 0.83 for testing phase, RMSE of 7.35 and 10.25, and MAE of 5.41 and 6.45 for training and testing phases respectively. At Shinya station, HGBR model obtained an R2 of 0.76 and 0.76, RMSE of 10.91 and 9.51, and MAE of 6.25 and 7.44 for training and testing phases respectively. At Torghundi station, this model recorded R2 of 0.92 and 0.80, RMSE of 5.51 and 8.54, and MAE of 3.97 and 5.97 for training and testing phases respectively. While other models showed good performance, HGBR was the only model that consistently maintained high accuracy and low error across both training and testing phases at all stations. Making it the best performing model for monthly rainfall prediction in the region. This study contributed scientifically by showing the effectiveness of satellite data and advanced ML models in monthly rainfall prediction for data-limited regions. Practically, it enables a cost-effective alternative to physical rain gauge installations by offering a reliable method to estimate monthly rainfall in order to support irrigation planning, water resource management, and disaster preparedness in climate-vulnerable regions.
ISSN:2169-3536