Dual-branch graph Transformer for node classification
As an emerging architecture, graph Transformers (GTs) have demonstrated significant potential in various graph-related tasks. Existing GTs are mainly oriented to graph-level tasks and have proved their advantages, but they do not perform well in node classification tasks. This mainly comes from two...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
AIMS Press
2025-02-01
|
| Series: | Electronic Research Archive |
| Subjects: | |
| Online Access: | https://www.aimspress.com/article/doi/10.3934/era.2025049 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | As an emerging architecture, graph Transformers (GTs) have demonstrated significant potential in various graph-related tasks. Existing GTs are mainly oriented to graph-level tasks and have proved their advantages, but they do not perform well in node classification tasks. This mainly comes from two aspects: (1) The global attention mechanism causes the computational complexity to grow quadratically with the number of nodes, resulting in substantial resource demands, especially on large-scale graphs; (2) a large number of long-distance irrelevant nodes disperse the attention weights and weaken the focus on local neighborhoods. To address these issues, we proposed a new model, dual-branch graph Transformer (DCAFormer). The model divided the graph into clusters with the same number of nodes by a graph partitioning algorithm to reduce the number of input nodes. Subsequently, the original graph was processed by graph neural network (GNN) to obtain outputs containing structural information. Next, we adopted a dual-branch architecture: The local branch (intracluster Transformer) captured local information within each cluster, reducing the impact of long-distance irrelevant nodes on attention; the global branch (intercluster Transformer) captured global interactions across clusters. Meanwhile, we designed a hybrid feature mechanism that integrated original features with GNN outputs and separately optimized the construction of the query ($ Q $), key ($ K $), and value ($ V $) matrices of the intracluster and intercluster Transformers in order to adapt to the different modeling requirements of two branches. We conducted extensive experiments on 8 benchmark node classification datasets, and the results showed that DCAFormer outperformed existing GTs and mainstream GNNs. |
|---|---|
| ISSN: | 2688-1594 |