LMF-Net: A Learnable Multimodal Fusion Network for Semantic Segmentation of Remote Sensing Data

Semantic segmentation of remote sensing images has produced a significant effect on many applications, such as land cover, land use, and smoke detection. With the ever-growing remote sensing data, fusing multimodal data from different sensors is a feasible and effective scheme for semantic segmentat...

Full description

Saved in:

Bibliographic Details
Main Authors:	Jihao Li, Wenkai Zhang, Weihang Zhang, Ruixue Zhou, Chongyang Li, Boyuan Tong, Xian Sun, Kun Fu
Format:	Article
Language:	English
Published:	IEEE 2025-01-01
Series:	IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
Subjects:	Deep learning (DL) learnable fusion multimodal data remote sensing semantic segmentation
Online Access:	https://ieeexplore.ieee.org/document/10833730/
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832586845790142464
author	Jihao Li Wenkai Zhang Weihang Zhang Ruixue Zhou Chongyang Li Boyuan Tong Xian Sun Kun Fu
author_facet	Jihao Li Wenkai Zhang Weihang Zhang Ruixue Zhou Chongyang Li Boyuan Tong Xian Sun Kun Fu
author_sort	Jihao Li
collection	DOAJ
description	Semantic segmentation of remote sensing images has produced a significant effect on many applications, such as land cover, land use, and smoke detection. With the ever-growing remote sensing data, fusing multimodal data from different sensors is a feasible and effective scheme for semantic segmentation task. Deep learning technology has prominently promoted the development of semantic segmentation. However, the majority of current approaches commonly focus more on feature mixing and construct relatively complex architectures. The further mining for cross-modal features is comparatively insufficient in heterogeneous data fusion. In addition, complex structures also lead to relatively heavy computation burden. Therefore, in this article, we propose an end-to-end learnable multimodal fusion network (LMF-Net) for remote sensing semantic segmentation. Concretely, we first develop a multiscale pooling fusion module by leveraging pooling operator. It provides key-value pairs with multimodal complementary information in a parameter-free manner and assigns them to self-attention (SA) layers of different modal branches. Then, to further harness the cross-modal collaborative embeddings/features, we elaborate two learnable fusion modules, learnable embedding fusion and learnable feature fusion. They are able to dynamically adjust the collaborative relationships of different modal embeddings and features in a learnable approach, respectively. Experiments on two well-established benchmark datasets reveal that our LMF-Net possesses superior segmentation behavior and strong generalization capability. In terms of computation complexity, it achieves competitive performance as well. Ultimately, the contribution of each component involved in LMF-Net is evaluated and discussed in detail.
format	Article
id	doaj-art-28522c30e1894684ac550f53891ea35b
institution	Kabale University
issn	1939-1404 2151-1535
language	English
publishDate	2025-01-01
publisher	IEEE
record_format	Article
series	IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
spelling	doaj-art-28522c30e1894684ac550f53891ea35b2025-01-25T00:00:05ZengIEEEIEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing1939-14042151-15352025-01-01183905392010.1109/JSTARS.2025.352721310833730LMF-Net: A Learnable Multimodal Fusion Network for Semantic Segmentation of Remote Sensing DataJihao Li0https://orcid.org/0000-0002-8277-4223Wenkai Zhang1https://orcid.org/0000-0002-8903-2708Weihang Zhang2https://orcid.org/0009-0005-1171-1734Ruixue Zhou3Chongyang Li4https://orcid.org/0009-0003-8234-4420Boyuan Tong5https://orcid.org/0009-0000-5100-5918Xian Sun6https://orcid.org/0000-0002-0038-9816Kun Fu7Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing, ChinaAerospace Information Research Institute, Chinese Academy of Sciences, Beijing, ChinaAerospace Information Research Institute, Chinese Academy of Sciences, Beijing, ChinaAerospace Information Research Institute, Chinese Academy of Sciences, Beijing, ChinaAerospace Information Research Institute, Chinese Academy of Sciences, Beijing, ChinaAerospace Information Research Institute, Chinese Academy of Sciences, Beijing, ChinaAerospace Information Research Institute, Chinese Academy of Sciences, Beijing, ChinaAerospace Information Research Institute, Chinese Academy of Sciences, Beijing, ChinaSemantic segmentation of remote sensing images has produced a significant effect on many applications, such as land cover, land use, and smoke detection. With the ever-growing remote sensing data, fusing multimodal data from different sensors is a feasible and effective scheme for semantic segmentation task. Deep learning technology has prominently promoted the development of semantic segmentation. However, the majority of current approaches commonly focus more on feature mixing and construct relatively complex architectures. The further mining for cross-modal features is comparatively insufficient in heterogeneous data fusion. In addition, complex structures also lead to relatively heavy computation burden. Therefore, in this article, we propose an end-to-end learnable multimodal fusion network (LMF-Net) for remote sensing semantic segmentation. Concretely, we first develop a multiscale pooling fusion module by leveraging pooling operator. It provides key-value pairs with multimodal complementary information in a parameter-free manner and assigns them to self-attention (SA) layers of different modal branches. Then, to further harness the cross-modal collaborative embeddings/features, we elaborate two learnable fusion modules, learnable embedding fusion and learnable feature fusion. They are able to dynamically adjust the collaborative relationships of different modal embeddings and features in a learnable approach, respectively. Experiments on two well-established benchmark datasets reveal that our LMF-Net possesses superior segmentation behavior and strong generalization capability. In terms of computation complexity, it achieves competitive performance as well. Ultimately, the contribution of each component involved in LMF-Net is evaluated and discussed in detail.https://ieeexplore.ieee.org/document/10833730/Deep learning (DL)learnable fusionmultimodal dataremote sensingsemantic segmentation
spellingShingle	Jihao Li Wenkai Zhang Weihang Zhang Ruixue Zhou Chongyang Li Boyuan Tong Xian Sun Kun Fu LMF-Net: A Learnable Multimodal Fusion Network for Semantic Segmentation of Remote Sensing Data IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing Deep learning (DL) learnable fusion multimodal data remote sensing semantic segmentation
title	LMF-Net: A Learnable Multimodal Fusion Network for Semantic Segmentation of Remote Sensing Data
title_full	LMF-Net: A Learnable Multimodal Fusion Network for Semantic Segmentation of Remote Sensing Data
title_fullStr	LMF-Net: A Learnable Multimodal Fusion Network for Semantic Segmentation of Remote Sensing Data
title_full_unstemmed	LMF-Net: A Learnable Multimodal Fusion Network for Semantic Segmentation of Remote Sensing Data
title_short	LMF-Net: A Learnable Multimodal Fusion Network for Semantic Segmentation of Remote Sensing Data
title_sort	lmf net a learnable multimodal fusion network for semantic segmentation of remote sensing data
topic	Deep learning (DL) learnable fusion multimodal data remote sensing semantic segmentation
url	https://ieeexplore.ieee.org/document/10833730/
work_keys_str_mv	AT jihaoli lmfnetalearnablemultimodalfusionnetworkforsemanticsegmentationofremotesensingdata AT wenkaizhang lmfnetalearnablemultimodalfusionnetworkforsemanticsegmentationofremotesensingdata AT weihangzhang lmfnetalearnablemultimodalfusionnetworkforsemanticsegmentationofremotesensingdata AT ruixuezhou lmfnetalearnablemultimodalfusionnetworkforsemanticsegmentationofremotesensingdata AT chongyangli lmfnetalearnablemultimodalfusionnetworkforsemanticsegmentationofremotesensingdata AT boyuantong lmfnetalearnablemultimodalfusionnetworkforsemanticsegmentationofremotesensingdata AT xiansun lmfnetalearnablemultimodalfusionnetworkforsemanticsegmentationofremotesensingdata AT kunfu lmfnetalearnablemultimodalfusionnetworkforsemanticsegmentationofremotesensingdata

LMF-Net: A Learnable Multimodal Fusion Network for Semantic Segmentation of Remote Sensing Data

Similar Items