Analysis of the 50-mile ultramarathon distance using a predictive XGBoost model

Abstract Although the 50-mile ultramarathon is one of the most common race distances, it has received little scientific attention. The objective of this study was to assess how an athlete’s age group, sex, nationality, and the race location, affect race speed. Utilizing a dataset with ultramarathon...

Full description

Saved in:
Bibliographic Details
Main Authors: Jonas Turnwald, David Valero, Pedro Forte, Katja Weiss, Elias Villiger, Mabliny Thuany, Volker Scheer, Matthias Wilhelm, Marilia Andrade, Ivan Cuk, Pantelis T. Nikolaidis, Beat Knechtle
Format: Article
Language:English
Published: Nature Portfolio 2025-03-01
Series:Scientific Reports
Online Access:https://doi.org/10.1038/s41598-025-92581-w
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850040179975258112
author Jonas Turnwald
David Valero
Pedro Forte
Katja Weiss
Elias Villiger
Mabliny Thuany
Volker Scheer
Matthias Wilhelm
Marilia Andrade
Ivan Cuk
Pantelis T. Nikolaidis
Beat Knechtle
author_facet Jonas Turnwald
David Valero
Pedro Forte
Katja Weiss
Elias Villiger
Mabliny Thuany
Volker Scheer
Matthias Wilhelm
Marilia Andrade
Ivan Cuk
Pantelis T. Nikolaidis
Beat Knechtle
author_sort Jonas Turnwald
collection DOAJ
description Abstract Although the 50-mile ultramarathon is one of the most common race distances, it has received little scientific attention. The objective of this study was to assess how an athlete’s age group, sex, nationality, and the race location, affect race speed. Utilizing a dataset with ultramarathon races from 1863 to 2022, a machine learning model based on the XGBoost algorithm was developed to predict the race speed based on the aforementioned variables. Model explainability tools, including model features relative importances and prediction distribution plots were then used to investigate how each feature affects the predicted race speed. The most important features, with respect to the predictive power of the XGBoost model, were the location of the race and the athlete’s gender. The top 3 countries with the fastest predicted median race speeds were Slovenia, New Zealand, and Bulgaria for nationality and New Zealand, Croatia, and Serbia for the race location. The fastest median race speed was predicted for the age group 20–24 years, but a marked age-related performance decline only became apparent from the age group 40–44 years onward. Model predictions for male athletes were faster than for female athletes. This study offers insights into factors influencing race speed in 50-mile ultramarathons, which may be beneficial for athletes, coaches, and race organizers. The identification of nationalities and event countries with fast race speeds provides a foundation for further exploration in the field of ultramarathon events.
format Article
id doaj-art-a7a8f33ae8624d51a0f3efc4ba5418a7
institution DOAJ
issn 2045-2322
language English
publishDate 2025-03-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj-art-a7a8f33ae8624d51a0f3efc4ba5418a72025-08-20T02:56:09ZengNature PortfolioScientific Reports2045-23222025-03-0115111110.1038/s41598-025-92581-wAnalysis of the 50-mile ultramarathon distance using a predictive XGBoost modelJonas Turnwald0David Valero1Pedro Forte2Katja Weiss3Elias Villiger4Mabliny Thuany5Volker Scheer6Matthias Wilhelm7Marilia Andrade8Ivan Cuk9Pantelis T. Nikolaidis10Beat Knechtle11Centre for Rehabilitation and Sports Medicine, University Hospital Bern, Inselspital Bern, University of BernUltra Sports Science FoundationHigher Institute of Educational Sciences of the DouroInstitute of Primary Care, University of ZurichInstitute of Primary Care, University of ZurichFaculty of Sports, University of PortoUltra Sports Science FoundationCentre for Rehabilitation and Sports Medicine, University Hospital Bern, Inselspital Bern, University of BernPhysiology Department, Federal University of Sao PauloFaculty of Sport and Physical Education, University of BelgradeSchool of Health and Caring Sciences, University of West AtticaInstitute of Primary Care, University of ZurichAbstract Although the 50-mile ultramarathon is one of the most common race distances, it has received little scientific attention. The objective of this study was to assess how an athlete’s age group, sex, nationality, and the race location, affect race speed. Utilizing a dataset with ultramarathon races from 1863 to 2022, a machine learning model based on the XGBoost algorithm was developed to predict the race speed based on the aforementioned variables. Model explainability tools, including model features relative importances and prediction distribution plots were then used to investigate how each feature affects the predicted race speed. The most important features, with respect to the predictive power of the XGBoost model, were the location of the race and the athlete’s gender. The top 3 countries with the fastest predicted median race speeds were Slovenia, New Zealand, and Bulgaria for nationality and New Zealand, Croatia, and Serbia for the race location. The fastest median race speed was predicted for the age group 20–24 years, but a marked age-related performance decline only became apparent from the age group 40–44 years onward. Model predictions for male athletes were faster than for female athletes. This study offers insights into factors influencing race speed in 50-mile ultramarathons, which may be beneficial for athletes, coaches, and race organizers. The identification of nationalities and event countries with fast race speeds provides a foundation for further exploration in the field of ultramarathon events.https://doi.org/10.1038/s41598-025-92581-w
spellingShingle Jonas Turnwald
David Valero
Pedro Forte
Katja Weiss
Elias Villiger
Mabliny Thuany
Volker Scheer
Matthias Wilhelm
Marilia Andrade
Ivan Cuk
Pantelis T. Nikolaidis
Beat Knechtle
Analysis of the 50-mile ultramarathon distance using a predictive XGBoost model
Scientific Reports
title Analysis of the 50-mile ultramarathon distance using a predictive XGBoost model
title_full Analysis of the 50-mile ultramarathon distance using a predictive XGBoost model
title_fullStr Analysis of the 50-mile ultramarathon distance using a predictive XGBoost model
title_full_unstemmed Analysis of the 50-mile ultramarathon distance using a predictive XGBoost model
title_short Analysis of the 50-mile ultramarathon distance using a predictive XGBoost model
title_sort analysis of the 50 mile ultramarathon distance using a predictive xgboost model
url https://doi.org/10.1038/s41598-025-92581-w
work_keys_str_mv AT jonasturnwald analysisofthe50mileultramarathondistanceusingapredictivexgboostmodel
AT davidvalero analysisofthe50mileultramarathondistanceusingapredictivexgboostmodel
AT pedroforte analysisofthe50mileultramarathondistanceusingapredictivexgboostmodel
AT katjaweiss analysisofthe50mileultramarathondistanceusingapredictivexgboostmodel
AT eliasvilliger analysisofthe50mileultramarathondistanceusingapredictivexgboostmodel
AT mablinythuany analysisofthe50mileultramarathondistanceusingapredictivexgboostmodel
AT volkerscheer analysisofthe50mileultramarathondistanceusingapredictivexgboostmodel
AT matthiaswilhelm analysisofthe50mileultramarathondistanceusingapredictivexgboostmodel
AT mariliaandrade analysisofthe50mileultramarathondistanceusingapredictivexgboostmodel
AT ivancuk analysisofthe50mileultramarathondistanceusingapredictivexgboostmodel
AT pantelistnikolaidis analysisofthe50mileultramarathondistanceusingapredictivexgboostmodel
AT beatknechtle analysisofthe50mileultramarathondistanceusingapredictivexgboostmodel