Improving wheat yield prediction through variable selection using Support Vector Regression, Random Forest, and Extreme Gradient Boosting

Plant breeding centers, in their relentless pursuit of more productive and resilient wheat varieties, have generated vast data repositories that are fundamental to ensuring global food security. This study uses these data to develop a wheat grain yield (GY) prediction model, using machine learning t...

Full description

Saved in:

Bibliographic Details
Main Authors:	Juan Carlos Moreno Sánchez, Héctor Gabriel Acosta Mesa, Adrián Trueba Espinosa, Sergio Ruiz Castilla, Farid García Lamont
Format:	Article
Language:	English
Published:	Elsevier 2025-03-01
Series:	Smart Agricultural Technology
Subjects:	Support vector regression Random forest Extreme gradient boosting, grain yield, vegetation indices, climate data
Online Access:	http://www.sciencedirect.com/science/article/pii/S2772375525000255
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832573031357087744
author	Juan Carlos Moreno Sánchez Héctor Gabriel Acosta Mesa Adrián Trueba Espinosa Sergio Ruiz Castilla Farid García Lamont
author_facet	Juan Carlos Moreno Sánchez Héctor Gabriel Acosta Mesa Adrián Trueba Espinosa Sergio Ruiz Castilla Farid García Lamont
author_sort	Juan Carlos Moreno Sánchez
collection	DOAJ
description	Plant breeding centers, in their relentless pursuit of more productive and resilient wheat varieties, have generated vast data repositories that are fundamental to ensuring global food security. This study uses these data to develop a wheat grain yield (GY) prediction model, using machine learning techniques such as Random Forest (RF), Support Vector Regression (SVR), and Extreme Gradient Boosting (XGBoost). The results obtained prove the potential of RF and XGBoost-based models to accurately predict wheat yield. One of the major challenges of this research was to find the most relevant variables for predicting wheat yield. Using clustering, feature selection, and variable combination techniques, particularly agronomic variables such as harvest index (HI) and biomass (BM), provided complementary information to the Normalized Difference Vegetation Index (NDVI). This combination, analyzed through the XGBoost model, resulted in an exceptional performance, with an RMSE of 28.5082 (grams/square meter) and an R² of 0.9156, showing the constructive collaboration between these indicators. After a thorough analysis, it was discovered that daily clustering and filtering of climatic variables, especially precipitation rate, were favorable in these types of models.
format	Article
id	doaj-art-b4798079881b4275a7da400d9fcae95b
institution	Kabale University
issn	2772-3755
language	English
publishDate	2025-03-01
publisher	Elsevier
record_format	Article
series	Smart Agricultural Technology
spelling	doaj-art-b4798079881b4275a7da400d9fcae95b2025-02-02T05:29:30ZengElsevierSmart Agricultural Technology2772-37552025-03-0110100791Improving wheat yield prediction through variable selection using Support Vector Regression, Random Forest, and Extreme Gradient BoostingJuan Carlos Moreno Sánchez0Héctor Gabriel Acosta Mesa1Adrián Trueba Espinosa2Sergio Ruiz Castilla3Farid García Lamont4Centro Universitario UAEM Texcoco, Universidad Autónoma del Estado de México, MexicoInstituto en Investigaciones en Inteligencia Artificial, Universidad Veracruzana, MexicoCentro Universitario UAEM Texcoco, Universidad Autónoma del Estado de México, Mexico; Corresponding author.Centro Universitario UAEM Texcoco, Universidad Autónoma del Estado de México, MexicoCentro Universitario UAEM Texcoco, Universidad Autónoma del Estado de México, MexicoPlant breeding centers, in their relentless pursuit of more productive and resilient wheat varieties, have generated vast data repositories that are fundamental to ensuring global food security. This study uses these data to develop a wheat grain yield (GY) prediction model, using machine learning techniques such as Random Forest (RF), Support Vector Regression (SVR), and Extreme Gradient Boosting (XGBoost). The results obtained prove the potential of RF and XGBoost-based models to accurately predict wheat yield. One of the major challenges of this research was to find the most relevant variables for predicting wheat yield. Using clustering, feature selection, and variable combination techniques, particularly agronomic variables such as harvest index (HI) and biomass (BM), provided complementary information to the Normalized Difference Vegetation Index (NDVI). This combination, analyzed through the XGBoost model, resulted in an exceptional performance, with an RMSE of 28.5082 (grams/square meter) and an R² of 0.9156, showing the constructive collaboration between these indicators. After a thorough analysis, it was discovered that daily clustering and filtering of climatic variables, especially precipitation rate, were favorable in these types of models.http://www.sciencedirect.com/science/article/pii/S2772375525000255Support vector regressionRandom forestExtreme gradient boosting, grain yield, vegetation indices, climate data
spellingShingle	Juan Carlos Moreno Sánchez Héctor Gabriel Acosta Mesa Adrián Trueba Espinosa Sergio Ruiz Castilla Farid García Lamont Improving wheat yield prediction through variable selection using Support Vector Regression, Random Forest, and Extreme Gradient Boosting Smart Agricultural Technology Support vector regression Random forest Extreme gradient boosting, grain yield, vegetation indices, climate data
title	Improving wheat yield prediction through variable selection using Support Vector Regression, Random Forest, and Extreme Gradient Boosting
title_full	Improving wheat yield prediction through variable selection using Support Vector Regression, Random Forest, and Extreme Gradient Boosting
title_fullStr	Improving wheat yield prediction through variable selection using Support Vector Regression, Random Forest, and Extreme Gradient Boosting
title_full_unstemmed	Improving wheat yield prediction through variable selection using Support Vector Regression, Random Forest, and Extreme Gradient Boosting
title_short	Improving wheat yield prediction through variable selection using Support Vector Regression, Random Forest, and Extreme Gradient Boosting
title_sort	improving wheat yield prediction through variable selection using support vector regression random forest and extreme gradient boosting
topic	Support vector regression Random forest Extreme gradient boosting, grain yield, vegetation indices, climate data
url	http://www.sciencedirect.com/science/article/pii/S2772375525000255
work_keys_str_mv	AT juancarlosmorenosanchez improvingwheatyieldpredictionthroughvariableselectionusingsupportvectorregressionrandomforestandextremegradientboosting AT hectorgabrielacostamesa improvingwheatyieldpredictionthroughvariableselectionusingsupportvectorregressionrandomforestandextremegradientboosting AT adriantruebaespinosa improvingwheatyieldpredictionthroughvariableselectionusingsupportvectorregressionrandomforestandextremegradientboosting AT sergioruizcastilla improvingwheatyieldpredictionthroughvariableselectionusingsupportvectorregressionrandomforestandextremegradientboosting AT faridgarcialamont improvingwheatyieldpredictionthroughvariableselectionusingsupportvectorregressionrandomforestandextremegradientboosting

Improving wheat yield prediction through variable selection using Support Vector Regression, Random Forest, and Extreme Gradient Boosting

Similar Items