A Multi-Agent Approach to Modeling Task-Oriented Dialog Policy Learning
Dialogue policy is a critical research area in human-computer interaction, vital for guiding dialogue generation and improving controllability and interpretability. Multi-agent dialogue policy learning demonstrates superior learning speed and exploration capabilities, positioning it as a promising a...
Saved in:
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2025-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/10840219/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832590313584066560 |
---|---|
author | Songfeng Liang Kai Xu Zhurong Dong |
author_facet | Songfeng Liang Kai Xu Zhurong Dong |
author_sort | Songfeng Liang |
collection | DOAJ |
description | Dialogue policy is a critical research area in human-computer interaction, vital for guiding dialogue generation and improving controllability and interpretability. Multi-agent dialogue policy learning demonstrates superior learning speed and exploration capabilities, positioning it as a promising approach for developing more effective and adaptive dialogue agents. However, many studies neglect to holistically model collaboration between agents, which limits the effectiveness of policy learning. Therefore, this paper proposes a new multi-agent group collaboration mechanism for dialogue policy learning, named GMPL. Concretely, we employ an Actor-Critic network to implement the proposed model, alternately updating individual dialogue agents to optimize policy selection. In each update, we utilize the maximum action value function to determine the appropriate dialogue action, while the maximum state value function serves to guide the policy learning process. This integrated approach ensures that both decision-making and learning phases are effectively aligned, thereby enhancing the overall performance of the dialogue agents. Furthermore, we conduct a theoretical analysis of the convergence properties of the proposed model. Experiments were conducted on two distinct task-oriented dialogue datasets, revealing that the proposed multi-agent model exhibits a significantly faster learning speed and a higher dialogue success rate compared to baseline approaches. |
format | Article |
id | doaj-art-28d8c72f1e974ebe8c81c7babbe6348d |
institution | Kabale University |
issn | 2169-3536 |
language | English |
publishDate | 2025-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj-art-28d8c72f1e974ebe8c81c7babbe6348d2025-01-24T00:01:43ZengIEEEIEEE Access2169-35362025-01-0113117541176410.1109/ACCESS.2025.352946910840219A Multi-Agent Approach to Modeling Task-Oriented Dialog Policy LearningSongfeng Liang0Kai Xu1https://orcid.org/0000-0001-8933-6704Zhurong Dong2School of Automotive and Transportation Engineering, Shenzhen Polytechnic University, Shenzhen, ChinaSchool of Software Engineering, South China University of Technology, Guangzhou, ChinaSchool of Automotive and Transportation Engineering, Shenzhen Polytechnic University, Shenzhen, ChinaDialogue policy is a critical research area in human-computer interaction, vital for guiding dialogue generation and improving controllability and interpretability. Multi-agent dialogue policy learning demonstrates superior learning speed and exploration capabilities, positioning it as a promising approach for developing more effective and adaptive dialogue agents. However, many studies neglect to holistically model collaboration between agents, which limits the effectiveness of policy learning. Therefore, this paper proposes a new multi-agent group collaboration mechanism for dialogue policy learning, named GMPL. Concretely, we employ an Actor-Critic network to implement the proposed model, alternately updating individual dialogue agents to optimize policy selection. In each update, we utilize the maximum action value function to determine the appropriate dialogue action, while the maximum state value function serves to guide the policy learning process. This integrated approach ensures that both decision-making and learning phases are effectively aligned, thereby enhancing the overall performance of the dialogue agents. Furthermore, we conduct a theoretical analysis of the convergence properties of the proposed model. Experiments were conducted on two distinct task-oriented dialogue datasets, revealing that the proposed multi-agent model exhibits a significantly faster learning speed and a higher dialogue success rate compared to baseline approaches.https://ieeexplore.ieee.org/document/10840219/Human-computer interactiondialogue policy learningdeep reinforcement learningmulti-agent learning |
spellingShingle | Songfeng Liang Kai Xu Zhurong Dong A Multi-Agent Approach to Modeling Task-Oriented Dialog Policy Learning IEEE Access Human-computer interaction dialogue policy learning deep reinforcement learning multi-agent learning |
title | A Multi-Agent Approach to Modeling Task-Oriented Dialog Policy Learning |
title_full | A Multi-Agent Approach to Modeling Task-Oriented Dialog Policy Learning |
title_fullStr | A Multi-Agent Approach to Modeling Task-Oriented Dialog Policy Learning |
title_full_unstemmed | A Multi-Agent Approach to Modeling Task-Oriented Dialog Policy Learning |
title_short | A Multi-Agent Approach to Modeling Task-Oriented Dialog Policy Learning |
title_sort | multi agent approach to modeling task oriented dialog policy learning |
topic | Human-computer interaction dialogue policy learning deep reinforcement learning multi-agent learning |
url | https://ieeexplore.ieee.org/document/10840219/ |
work_keys_str_mv | AT songfengliang amultiagentapproachtomodelingtaskorienteddialogpolicylearning AT kaixu amultiagentapproachtomodelingtaskorienteddialogpolicylearning AT zhurongdong amultiagentapproachtomodelingtaskorienteddialogpolicylearning AT songfengliang multiagentapproachtomodelingtaskorienteddialogpolicylearning AT kaixu multiagentapproachtomodelingtaskorienteddialogpolicylearning AT zhurongdong multiagentapproachtomodelingtaskorienteddialogpolicylearning |