Rethinking Exploration and Experience Exploitation in Value-Based Multi-Agent Reinforcement Learning

Cooperative Multi-Agent Reinforcement Learning (MARL) focuses on developing strategies to effectively train multiple agents to learn and adapt policies collaboratively. Despite being a relatively new area of research, most MARL methods are based on well-established approaches used in single-agent de...

Full description

Saved in:

Bibliographic Details
Main Authors:	Anatolii Borzilov, Alexey Skrynnik, Aleksandr Panov
Format:	Article
Language:	English
Published:	IEEE 2025-01-01
Series:	IEEE Access
Subjects:	Exploration multi-agent reinforcement learning value based methods
Online Access:	https://ieeexplore.ieee.org/document/10844859/
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832583203306602496
author	Anatolii Borzilov Alexey Skrynnik Aleksandr Panov
author_facet	Anatolii Borzilov Alexey Skrynnik Aleksandr Panov
author_sort	Anatolii Borzilov
collection	DOAJ
description	Cooperative Multi-Agent Reinforcement Learning (MARL) focuses on developing strategies to effectively train multiple agents to learn and adapt policies collaboratively. Despite being a relatively new area of research, most MARL methods are based on well-established approaches used in single-agent deep learning tasks due to their proven effectiveness. In this paper, we focus on the exploration problem inherent in many MARL algorithms. These algorithms often introduce new hyperparameters and incorporate auxiliary components, such as additional models, which complicate the adaptation process of the underlying RL algorithm to better fit multi-agent environments. We aim to optimize a deep MARL algorithm with minimal modifications to the well-known QMIX approach. Our investigation of the exploitation-exploration dilemma shows that the performance of state-of-the-art MARL algorithms can be matched by a simple modification of the <inline-formula> <tex-math notation="LaTeX">$\epsilon $ </tex-math></inline-formula>-greedy policy. This modification depends on the ratio of available joint actions to the number of agents. We also improve the training aspect of the replay buffer to decorrelate experiences based on recurrent rollouts rather than episodes. The improved algorithm is not only easy to implement, but also aligns with state-of-the-art methods without adding significant complexity. Our approach outperforms existing algorithms in four of seven scenarios across three distinct environments while remaining competitive in the other three.
format	Article
id	doaj-art-025e3ecd7f0f4801acd6141a97ee7ba5
institution	Kabale University
issn	2169-3536
language	English
publishDate	2025-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj-art-025e3ecd7f0f4801acd6141a97ee7ba52025-01-29T00:01:03ZengIEEEIEEE Access2169-35362025-01-0113137701378110.1109/ACCESS.2025.353097410844859Rethinking Exploration and Experience Exploitation in Value-Based Multi-Agent Reinforcement LearningAnatolii Borzilov0https://orcid.org/0009-0000-7032-7314Alexey Skrynnik1Aleksandr Panov2https://orcid.org/0000-0002-9747-3837Federal Research Center “Computer Science and Control,” of the Russian Academy of Sciences, Moscow, RussiaFederal Research Center “Computer Science and Control,” of the Russian Academy of Sciences, Moscow, RussiaFederal Research Center “Computer Science and Control,” of the Russian Academy of Sciences, Moscow, RussiaCooperative Multi-Agent Reinforcement Learning (MARL) focuses on developing strategies to effectively train multiple agents to learn and adapt policies collaboratively. Despite being a relatively new area of research, most MARL methods are based on well-established approaches used in single-agent deep learning tasks due to their proven effectiveness. In this paper, we focus on the exploration problem inherent in many MARL algorithms. These algorithms often introduce new hyperparameters and incorporate auxiliary components, such as additional models, which complicate the adaptation process of the underlying RL algorithm to better fit multi-agent environments. We aim to optimize a deep MARL algorithm with minimal modifications to the well-known QMIX approach. Our investigation of the exploitation-exploration dilemma shows that the performance of state-of-the-art MARL algorithms can be matched by a simple modification of the <inline-formula> <tex-math notation="LaTeX">$\epsilon $ </tex-math></inline-formula>-greedy policy. This modification depends on the ratio of available joint actions to the number of agents. We also improve the training aspect of the replay buffer to decorrelate experiences based on recurrent rollouts rather than episodes. The improved algorithm is not only easy to implement, but also aligns with state-of-the-art methods without adding significant complexity. Our approach outperforms existing algorithms in four of seven scenarios across three distinct environments while remaining competitive in the other three.https://ieeexplore.ieee.org/document/10844859/Explorationmulti-agent reinforcement learningvalue based methods
spellingShingle	Anatolii Borzilov Alexey Skrynnik Aleksandr Panov Rethinking Exploration and Experience Exploitation in Value-Based Multi-Agent Reinforcement Learning IEEE Access Exploration multi-agent reinforcement learning value based methods
title	Rethinking Exploration and Experience Exploitation in Value-Based Multi-Agent Reinforcement Learning
title_full	Rethinking Exploration and Experience Exploitation in Value-Based Multi-Agent Reinforcement Learning
title_fullStr	Rethinking Exploration and Experience Exploitation in Value-Based Multi-Agent Reinforcement Learning
title_full_unstemmed	Rethinking Exploration and Experience Exploitation in Value-Based Multi-Agent Reinforcement Learning
title_short	Rethinking Exploration and Experience Exploitation in Value-Based Multi-Agent Reinforcement Learning
title_sort	rethinking exploration and experience exploitation in value based multi agent reinforcement learning
topic	Exploration multi-agent reinforcement learning value based methods
url	https://ieeexplore.ieee.org/document/10844859/
work_keys_str_mv	AT anatoliiborzilov rethinkingexplorationandexperienceexploitationinvaluebasedmultiagentreinforcementlearning AT alexeyskrynnik rethinkingexplorationandexperienceexploitationinvaluebasedmultiagentreinforcementlearning AT aleksandrpanov rethinkingexplorationandexperienceexploitationinvaluebasedmultiagentreinforcementlearning

Rethinking Exploration and Experience Exploitation in Value-Based Multi-Agent Reinforcement Learning

Similar Items