Dueling Network Architecture for GNN in the Deep Reinforcement Learning for the Automated ICT System Design

This paper presents an improved deep reinforcement learning-based (DRL) approach for end-to-end models using a Graph Neural Network(GNN). The proposed method aims to improve end-to-end deep Q learning with a GNN by decomposing the GNN-based Q-network structure into two sub-streams to separately esti...

Full description

Saved in:
Bibliographic Details
Main Authors: Tianchen Zhou, Yutaka Yakuwa, Natsuki Okamura, Hiroyuki Hochigai, Takayuki Kuroda, Ikuko Eguchi Yairi
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10854435/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:This paper presents an improved deep reinforcement learning-based (DRL) approach for end-to-end models using a Graph Neural Network(GNN). The proposed method aims to improve end-to-end deep Q learning with a GNN by decomposing the GNN-based Q-network structure into two sub-streams to separately estimate the global state value and the state-dependent action advantage instead. By doing that, our dueling GNN architecture can independently learn which states are valuable or not. This is achieved by utilizing the graph-dependent global-state value rather than relying on the effect of each action for each state. This approach provides a more accurate approximation of the Q-value. With better Q-value approximation, the network can deal with the problem of massive state space with sparse rewards and significantly achieve higher learning efficiency without imposing any change to the underlying reinforcement learning algorithm. The proposed method was introduced into an automated ICT system design model. The automated ICT system design model faces a fundamental challenge characterized by prolonged learning times, primarily attributable to the tendency to overestimate particular configurations owing to the scarcity of rewards despite the vast exploration space encompassing numerous possible combinations of ICT system components. The results reveal that the proposed architecture effectively improves the learning efficiency of the DRL model without imposing any changes to the underlying reinforcement learning algorithm.
ISSN:2169-3536