Improving wheat yield prediction through variable selection using Support Vector Regression, Random Forest, and Extreme Gradient Boosting
Plant breeding centers, in their relentless pursuit of more productive and resilient wheat varieties, have generated vast data repositories that are fundamental to ensuring global food security. This study uses these data to develop a wheat grain yield (GY) prediction model, using machine learning t...
Saved in:
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Elsevier
2025-03-01
|
Series: | Smart Agricultural Technology |
Subjects: | |
Online Access: | http://www.sciencedirect.com/science/article/pii/S2772375525000255 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832573031357087744 |
---|---|
author | Juan Carlos Moreno Sánchez Héctor Gabriel Acosta Mesa Adrián Trueba Espinosa Sergio Ruiz Castilla Farid García Lamont |
author_facet | Juan Carlos Moreno Sánchez Héctor Gabriel Acosta Mesa Adrián Trueba Espinosa Sergio Ruiz Castilla Farid García Lamont |
author_sort | Juan Carlos Moreno Sánchez |
collection | DOAJ |
description | Plant breeding centers, in their relentless pursuit of more productive and resilient wheat varieties, have generated vast data repositories that are fundamental to ensuring global food security. This study uses these data to develop a wheat grain yield (GY) prediction model, using machine learning techniques such as Random Forest (RF), Support Vector Regression (SVR), and Extreme Gradient Boosting (XGBoost). The results obtained prove the potential of RF and XGBoost-based models to accurately predict wheat yield. One of the major challenges of this research was to find the most relevant variables for predicting wheat yield. Using clustering, feature selection, and variable combination techniques, particularly agronomic variables such as harvest index (HI) and biomass (BM), provided complementary information to the Normalized Difference Vegetation Index (NDVI). This combination, analyzed through the XGBoost model, resulted in an exceptional performance, with an RMSE of 28.5082 (grams/square meter) and an R² of 0.9156, showing the constructive collaboration between these indicators. After a thorough analysis, it was discovered that daily clustering and filtering of climatic variables, especially precipitation rate, were favorable in these types of models. |
format | Article |
id | doaj-art-b4798079881b4275a7da400d9fcae95b |
institution | Kabale University |
issn | 2772-3755 |
language | English |
publishDate | 2025-03-01 |
publisher | Elsevier |
record_format | Article |
series | Smart Agricultural Technology |
spelling | doaj-art-b4798079881b4275a7da400d9fcae95b2025-02-02T05:29:30ZengElsevierSmart Agricultural Technology2772-37552025-03-0110100791Improving wheat yield prediction through variable selection using Support Vector Regression, Random Forest, and Extreme Gradient BoostingJuan Carlos Moreno Sánchez0Héctor Gabriel Acosta Mesa1Adrián Trueba Espinosa2Sergio Ruiz Castilla3Farid García Lamont4Centro Universitario UAEM Texcoco, Universidad Autónoma del Estado de México, MexicoInstituto en Investigaciones en Inteligencia Artificial, Universidad Veracruzana, MexicoCentro Universitario UAEM Texcoco, Universidad Autónoma del Estado de México, Mexico; Corresponding author.Centro Universitario UAEM Texcoco, Universidad Autónoma del Estado de México, MexicoCentro Universitario UAEM Texcoco, Universidad Autónoma del Estado de México, MexicoPlant breeding centers, in their relentless pursuit of more productive and resilient wheat varieties, have generated vast data repositories that are fundamental to ensuring global food security. This study uses these data to develop a wheat grain yield (GY) prediction model, using machine learning techniques such as Random Forest (RF), Support Vector Regression (SVR), and Extreme Gradient Boosting (XGBoost). The results obtained prove the potential of RF and XGBoost-based models to accurately predict wheat yield. One of the major challenges of this research was to find the most relevant variables for predicting wheat yield. Using clustering, feature selection, and variable combination techniques, particularly agronomic variables such as harvest index (HI) and biomass (BM), provided complementary information to the Normalized Difference Vegetation Index (NDVI). This combination, analyzed through the XGBoost model, resulted in an exceptional performance, with an RMSE of 28.5082 (grams/square meter) and an R² of 0.9156, showing the constructive collaboration between these indicators. After a thorough analysis, it was discovered that daily clustering and filtering of climatic variables, especially precipitation rate, were favorable in these types of models.http://www.sciencedirect.com/science/article/pii/S2772375525000255Support vector regressionRandom forestExtreme gradient boosting, grain yield, vegetation indices, climate data |
spellingShingle | Juan Carlos Moreno Sánchez Héctor Gabriel Acosta Mesa Adrián Trueba Espinosa Sergio Ruiz Castilla Farid García Lamont Improving wheat yield prediction through variable selection using Support Vector Regression, Random Forest, and Extreme Gradient Boosting Smart Agricultural Technology Support vector regression Random forest Extreme gradient boosting, grain yield, vegetation indices, climate data |
title | Improving wheat yield prediction through variable selection using Support Vector Regression, Random Forest, and Extreme Gradient Boosting |
title_full | Improving wheat yield prediction through variable selection using Support Vector Regression, Random Forest, and Extreme Gradient Boosting |
title_fullStr | Improving wheat yield prediction through variable selection using Support Vector Regression, Random Forest, and Extreme Gradient Boosting |
title_full_unstemmed | Improving wheat yield prediction through variable selection using Support Vector Regression, Random Forest, and Extreme Gradient Boosting |
title_short | Improving wheat yield prediction through variable selection using Support Vector Regression, Random Forest, and Extreme Gradient Boosting |
title_sort | improving wheat yield prediction through variable selection using support vector regression random forest and extreme gradient boosting |
topic | Support vector regression Random forest Extreme gradient boosting, grain yield, vegetation indices, climate data |
url | http://www.sciencedirect.com/science/article/pii/S2772375525000255 |
work_keys_str_mv | AT juancarlosmorenosanchez improvingwheatyieldpredictionthroughvariableselectionusingsupportvectorregressionrandomforestandextremegradientboosting AT hectorgabrielacostamesa improvingwheatyieldpredictionthroughvariableselectionusingsupportvectorregressionrandomforestandextremegradientboosting AT adriantruebaespinosa improvingwheatyieldpredictionthroughvariableselectionusingsupportvectorregressionrandomforestandextremegradientboosting AT sergioruizcastilla improvingwheatyieldpredictionthroughvariableselectionusingsupportvectorregressionrandomforestandextremegradientboosting AT faridgarcialamont improvingwheatyieldpredictionthroughvariableselectionusingsupportvectorregressionrandomforestandextremegradientboosting |