A Closer Look at Invalid Action Masking in Policy Gradient Algorithms
In recent years, Deep Reinforcement Learning (DRL) algorithms have achieved state-of-the-art performance in many challenging strategy games. Because these games have complicated rules, an action sampled from the full discrete action distribution predicted by the learned policy is likely to be invali...
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
LibraryPress@UF
2022-05-01
|
| Series: | Proceedings of the International Florida Artificial Intelligence Research Society Conference |
| Subjects: | |
| Online Access: | https://journals.flvc.org/FLAIRS/article/view/130584 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850270998473998336 |
|---|---|
| author | Shengyi Huang Santiago Ontañón |
| author_facet | Shengyi Huang Santiago Ontañón |
| author_sort | Shengyi Huang |
| collection | DOAJ |
| description | In recent years, Deep Reinforcement Learning (DRL) algorithms have achieved state-of-the-art performance in many challenging strategy games. Because these games have complicated rules, an action sampled from the full discrete action distribution predicted by the learned policy is likely to be invalid according to the game rules (e.g., walking into a wall). The usual approach to deal with this problem in policy gradient algorithms is to “mask out” invalid actions and just sample from the set of valid actions. The implications of this process, however, remain under-investigated. In this paper, we 1) show theoretical justification for such a practice, 2) empirically demonstrate its importance as the space of invalid actions grows, and 3) provide further insights by evaluating different action masking regimes, such as removing masking after an agent has been trained using masking. |
| format | Article |
| id | doaj-art-e0cd53b0c8a74492a0f79014d3c13cbd |
| institution | OA Journals |
| issn | 2334-0754 2334-0762 |
| language | English |
| publishDate | 2022-05-01 |
| publisher | LibraryPress@UF |
| record_format | Article |
| series | Proceedings of the International Florida Artificial Intelligence Research Society Conference |
| spelling | doaj-art-e0cd53b0c8a74492a0f79014d3c13cbd2025-08-20T01:52:22ZengLibraryPress@UFProceedings of the International Florida Artificial Intelligence Research Society Conference2334-07542334-07622022-05-013510.32473/flairs.v35i.13058466783A Closer Look at Invalid Action Masking in Policy Gradient AlgorithmsShengyi Huang0Santiago Ontañón1Drexel UniversityDrexel UniversityIn recent years, Deep Reinforcement Learning (DRL) algorithms have achieved state-of-the-art performance in many challenging strategy games. Because these games have complicated rules, an action sampled from the full discrete action distribution predicted by the learned policy is likely to be invalid according to the game rules (e.g., walking into a wall). The usual approach to deal with this problem in policy gradient algorithms is to “mask out” invalid actions and just sample from the set of valid actions. The implications of this process, however, remain under-investigated. In this paper, we 1) show theoretical justification for such a practice, 2) empirically demonstrate its importance as the space of invalid actions grows, and 3) provide further insights by evaluating different action masking regimes, such as removing masking after an agent has been trained using masking.https://journals.flvc.org/FLAIRS/article/view/130584reinforcement learningdeep learningdeep reinforcement learningreal-time strategy gamesimplementation detailsinvalid action masking |
| spellingShingle | Shengyi Huang Santiago Ontañón A Closer Look at Invalid Action Masking in Policy Gradient Algorithms Proceedings of the International Florida Artificial Intelligence Research Society Conference reinforcement learning deep learning deep reinforcement learning real-time strategy games implementation details invalid action masking |
| title | A Closer Look at Invalid Action Masking in Policy Gradient Algorithms |
| title_full | A Closer Look at Invalid Action Masking in Policy Gradient Algorithms |
| title_fullStr | A Closer Look at Invalid Action Masking in Policy Gradient Algorithms |
| title_full_unstemmed | A Closer Look at Invalid Action Masking in Policy Gradient Algorithms |
| title_short | A Closer Look at Invalid Action Masking in Policy Gradient Algorithms |
| title_sort | closer look at invalid action masking in policy gradient algorithms |
| topic | reinforcement learning deep learning deep reinforcement learning real-time strategy games implementation details invalid action masking |
| url | https://journals.flvc.org/FLAIRS/article/view/130584 |
| work_keys_str_mv | AT shengyihuang acloserlookatinvalidactionmaskinginpolicygradientalgorithms AT santiagoontanon acloserlookatinvalidactionmaskinginpolicygradientalgorithms AT shengyihuang closerlookatinvalidactionmaskinginpolicygradientalgorithms AT santiagoontanon closerlookatinvalidactionmaskinginpolicygradientalgorithms |