Comparative Evaluation of Mean Cumulative Regret in Multi-Armed Bandit Algorithms: ETC, UCB, Asymptotically Optimal UCB, and TS

This research provides insights into how to address short-term and long-term decision-making in different kinds of the Multi-Armed Bandit (MAB) problem, a classic problem in decision-making under uncertainty. In this study, four algorithms - Explore-Then-Commit (ETC), the Upper Confidence Bound (UCB...

Full description

Saved in:
Bibliographic Details
Main Author: Lei Yicong
Format: Article
Language:English
Published: EDP Sciences 2025-01-01
Series:ITM Web of Conferences
Online Access:https://www.itm-conferences.org/articles/itmconf/pdf/2025/04/itmconf_iwadi2024_01026.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:This research provides insights into how to address short-term and long-term decision-making in different kinds of the Multi-Armed Bandit (MAB) problem, a classic problem in decision-making under uncertainty. In this study, four algorithms - Explore-Then-Commit (ETC), the Upper Confidence Bound (UCB), Asymptotically Optimal UCB, and Thompson Sampling Algorithms (TS) – are selected to solve the MAB problem with numerical and categorical types. Different types represent different value intervals. Each algorithm is applied to each dataset with two different horizons, which represent the number of iterations, to evaluate its short-term and long-term decision-making ability. All algorithms are then utilized in each dataset to compare which one is most suitable for solving a certain type of MAB problem. This research provides an explicit introduction to the MAB problem and the four algorithms. Furthermore, it concludes that both Asymptotically Optimal UCB and TS are suitable for decision-making in the short and long term. At the same time, Asymptotically Optimal UCB is the most appropriate for the numerical MAB problem, while TS is the most appropriate for the categorical MAB problem. Additionally, UCB only suits short-term decision-making, ETC can be efficient only in the numerical MAB problem.
ISSN:2271-2097