A Closer Look at Invalid Action Masking in Policy Gradient Algorithms

In recent years, Deep Reinforcement Learning (DRL) algorithms have achieved state-of-the-art performance in many challenging strategy games. Because these games have complicated rules, an action sampled from the full discrete action distribution predicted by the learned policy is likely to be invali...

Full description

Saved in:
Bibliographic Details
Main Authors: Shengyi Huang, Santiago Ontañón
Format: Article
Language:English
Published: LibraryPress@UF 2022-05-01
Series:Proceedings of the International Florida Artificial Intelligence Research Society Conference
Subjects:
Online Access:https://journals.flvc.org/FLAIRS/article/view/130584
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850270998473998336
author Shengyi Huang
Santiago Ontañón
author_facet Shengyi Huang
Santiago Ontañón
author_sort Shengyi Huang
collection DOAJ
description In recent years, Deep Reinforcement Learning (DRL) algorithms have achieved state-of-the-art performance in many challenging strategy games. Because these games have complicated rules, an action sampled from the full discrete action distribution predicted by the learned policy is likely to be invalid according to the game rules (e.g., walking into a wall). The usual approach to deal with this problem in policy gradient algorithms is to “mask out” invalid actions and just sample from the set of valid actions. The implications of this process, however, remain under-investigated. In this paper, we 1) show theoretical justification for such a practice, 2) empirically demonstrate its importance as the space of invalid actions grows, and 3) provide further insights by evaluating different action masking regimes, such as removing masking after an agent has been trained using masking.
format Article
id doaj-art-e0cd53b0c8a74492a0f79014d3c13cbd
institution OA Journals
issn 2334-0754
2334-0762
language English
publishDate 2022-05-01
publisher LibraryPress@UF
record_format Article
series Proceedings of the International Florida Artificial Intelligence Research Society Conference
spelling doaj-art-e0cd53b0c8a74492a0f79014d3c13cbd2025-08-20T01:52:22ZengLibraryPress@UFProceedings of the International Florida Artificial Intelligence Research Society Conference2334-07542334-07622022-05-013510.32473/flairs.v35i.13058466783A Closer Look at Invalid Action Masking in Policy Gradient AlgorithmsShengyi Huang0Santiago Ontañón1Drexel UniversityDrexel UniversityIn recent years, Deep Reinforcement Learning (DRL) algorithms have achieved state-of-the-art performance in many challenging strategy games. Because these games have complicated rules, an action sampled from the full discrete action distribution predicted by the learned policy is likely to be invalid according to the game rules (e.g., walking into a wall). The usual approach to deal with this problem in policy gradient algorithms is to “mask out” invalid actions and just sample from the set of valid actions. The implications of this process, however, remain under-investigated. In this paper, we 1) show theoretical justification for such a practice, 2) empirically demonstrate its importance as the space of invalid actions grows, and 3) provide further insights by evaluating different action masking regimes, such as removing masking after an agent has been trained using masking.https://journals.flvc.org/FLAIRS/article/view/130584reinforcement learningdeep learningdeep reinforcement learningreal-time strategy gamesimplementation detailsinvalid action masking
spellingShingle Shengyi Huang
Santiago Ontañón
A Closer Look at Invalid Action Masking in Policy Gradient Algorithms
Proceedings of the International Florida Artificial Intelligence Research Society Conference
reinforcement learning
deep learning
deep reinforcement learning
real-time strategy games
implementation details
invalid action masking
title A Closer Look at Invalid Action Masking in Policy Gradient Algorithms
title_full A Closer Look at Invalid Action Masking in Policy Gradient Algorithms
title_fullStr A Closer Look at Invalid Action Masking in Policy Gradient Algorithms
title_full_unstemmed A Closer Look at Invalid Action Masking in Policy Gradient Algorithms
title_short A Closer Look at Invalid Action Masking in Policy Gradient Algorithms
title_sort closer look at invalid action masking in policy gradient algorithms
topic reinforcement learning
deep learning
deep reinforcement learning
real-time strategy games
implementation details
invalid action masking
url https://journals.flvc.org/FLAIRS/article/view/130584
work_keys_str_mv AT shengyihuang acloserlookatinvalidactionmaskinginpolicygradientalgorithms
AT santiagoontanon acloserlookatinvalidactionmaskinginpolicygradientalgorithms
AT shengyihuang closerlookatinvalidactionmaskinginpolicygradientalgorithms
AT santiagoontanon closerlookatinvalidactionmaskinginpolicygradientalgorithms