Multimodal Recommendation System Based on Cross Self-Attention Fusion

Recent advances in graph neural networks (GNNs) have enhanced multimodal recommendation systems’ ability to process complex user–item interactions. However, current approaches face two key limitations: they rely on static similarity metrics for product relationship graphs and they struggle to effect...

Full description

Saved in:

Bibliographic Details
Main Authors:	Peishan Li, Weixiao Zhan, Lutao Gao, Shuran Wang, Linnan Yang
Format:	Article
Language:	English
Published:	MDPI AG 2025-01-01
Series:	Systems
Subjects:	graph neural networks multimodal recommendation systems attention mechanism personalized recommendation
Online Access:	https://www.mdpi.com/2079-8954/13/1/57
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832587427454124032
author	Peishan Li Weixiao Zhan Lutao Gao Shuran Wang Linnan Yang
author_facet	Peishan Li Weixiao Zhan Lutao Gao Shuran Wang Linnan Yang
author_sort	Peishan Li
collection	DOAJ
description	Recent advances in graph neural networks (GNNs) have enhanced multimodal recommendation systems’ ability to process complex user–item interactions. However, current approaches face two key limitations: they rely on static similarity metrics for product relationship graphs and they struggle to effectively fuse information across modalities. We propose MR-CSAF, a novel multimodal recommendation algorithm using cross-self-attention fusion. Building on FREEDOM, our approach introduces an adaptive modality selector that dynamically weights each modality’s contribution to product similarity, enabling more accurate product relationship graphs and optimized modality representations. We employ a cross-self-attention mechanism to facilitate both inter- and intra-modal information transfer, while using graph convolution to incorporate updated features into item and product modal representations. Experimental results on three public datasets demonstrate MR-CSAF outperforms eight baseline methods, validating its effectiveness in providing personalized recommendations, advancing the field of personalized recommendation in complex multimodal environments.
format	Article
id	doaj-art-61df0415a6d54eef9b78ba00d6f08cf3
institution	Kabale University
issn	2079-8954
language	English
publishDate	2025-01-01
publisher	MDPI AG
record_format	Article
series	Systems
spelling	doaj-art-61df0415a6d54eef9b78ba00d6f08cf32025-01-24T13:50:39ZengMDPI AGSystems2079-89542025-01-011315710.3390/systems13010057Multimodal Recommendation System Based on Cross Self-Attention FusionPeishan Li0Weixiao Zhan1Lutao Gao2Shuran Wang3Linnan Yang4College of Big Data, Yunnan Agricultural University, Kunming 650201, ChinaCollege of Computer Science and Engineering, University of California, San Diego, CA 92093, USACollege of Big Data, Yunnan Agricultural University, Kunming 650201, ChinaCollege of Big Data, Yunnan Agricultural University, Kunming 650201, ChinaCollege of Big Data, Yunnan Agricultural University, Kunming 650201, ChinaRecent advances in graph neural networks (GNNs) have enhanced multimodal recommendation systems’ ability to process complex user–item interactions. However, current approaches face two key limitations: they rely on static similarity metrics for product relationship graphs and they struggle to effectively fuse information across modalities. We propose MR-CSAF, a novel multimodal recommendation algorithm using cross-self-attention fusion. Building on FREEDOM, our approach introduces an adaptive modality selector that dynamically weights each modality’s contribution to product similarity, enabling more accurate product relationship graphs and optimized modality representations. We employ a cross-self-attention mechanism to facilitate both inter- and intra-modal information transfer, while using graph convolution to incorporate updated features into item and product modal representations. Experimental results on three public datasets demonstrate MR-CSAF outperforms eight baseline methods, validating its effectiveness in providing personalized recommendations, advancing the field of personalized recommendation in complex multimodal environments.https://www.mdpi.com/2079-8954/13/1/57graph neural networksmultimodal recommendation systemsattention mechanismpersonalized recommendation
spellingShingle	Peishan Li Weixiao Zhan Lutao Gao Shuran Wang Linnan Yang Multimodal Recommendation System Based on Cross Self-Attention Fusion Systems graph neural networks multimodal recommendation systems attention mechanism personalized recommendation
title	Multimodal Recommendation System Based on Cross Self-Attention Fusion
title_full	Multimodal Recommendation System Based on Cross Self-Attention Fusion
title_fullStr	Multimodal Recommendation System Based on Cross Self-Attention Fusion
title_full_unstemmed	Multimodal Recommendation System Based on Cross Self-Attention Fusion
title_short	Multimodal Recommendation System Based on Cross Self-Attention Fusion
title_sort	multimodal recommendation system based on cross self attention fusion
topic	graph neural networks multimodal recommendation systems attention mechanism personalized recommendation
url	https://www.mdpi.com/2079-8954/13/1/57
work_keys_str_mv	AT peishanli multimodalrecommendationsystembasedoncrossselfattentionfusion AT weixiaozhan multimodalrecommendationsystembasedoncrossselfattentionfusion AT lutaogao multimodalrecommendationsystembasedoncrossselfattentionfusion AT shuranwang multimodalrecommendationsystembasedoncrossselfattentionfusion AT linnanyang multimodalrecommendationsystembasedoncrossselfattentionfusion

Multimodal Recommendation System Based on Cross Self-Attention Fusion

Similar Items