LMF-Net: A Learnable Multimodal Fusion Network for Semantic Segmentation of Remote Sensing Data

Semantic segmentation of remote sensing images has produced a significant effect on many applications, such as land cover, land use, and smoke detection. With the ever-growing remote sensing data, fusing multimodal data from different sensors is a feasible and effective scheme for semantic segmentat...

Full description

Saved in:
Bibliographic Details
Main Authors: Jihao Li, Wenkai Zhang, Weihang Zhang, Ruixue Zhou, Chongyang Li, Boyuan Tong, Xian Sun, Kun Fu
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10833730/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832586845790142464
author Jihao Li
Wenkai Zhang
Weihang Zhang
Ruixue Zhou
Chongyang Li
Boyuan Tong
Xian Sun
Kun Fu
author_facet Jihao Li
Wenkai Zhang
Weihang Zhang
Ruixue Zhou
Chongyang Li
Boyuan Tong
Xian Sun
Kun Fu
author_sort Jihao Li
collection DOAJ
description Semantic segmentation of remote sensing images has produced a significant effect on many applications, such as land cover, land use, and smoke detection. With the ever-growing remote sensing data, fusing multimodal data from different sensors is a feasible and effective scheme for semantic segmentation task. Deep learning technology has prominently promoted the development of semantic segmentation. However, the majority of current approaches commonly focus more on feature mixing and construct relatively complex architectures. The further mining for cross-modal features is comparatively insufficient in heterogeneous data fusion. In addition, complex structures also lead to relatively heavy computation burden. Therefore, in this article, we propose an end-to-end learnable multimodal fusion network (LMF-Net) for remote sensing semantic segmentation. Concretely, we first develop a multiscale pooling fusion module by leveraging pooling operator. It provides key-value pairs with multimodal complementary information in a parameter-free manner and assigns them to self-attention (SA) layers of different modal branches. Then, to further harness the cross-modal collaborative embeddings/features, we elaborate two learnable fusion modules, learnable embedding fusion and learnable feature fusion. They are able to dynamically adjust the collaborative relationships of different modal embeddings and features in a learnable approach, respectively. Experiments on two well-established benchmark datasets reveal that our LMF-Net possesses superior segmentation behavior and strong generalization capability. In terms of computation complexity, it achieves competitive performance as well. Ultimately, the contribution of each component involved in LMF-Net is evaluated and discussed in detail.
format Article
id doaj-art-28522c30e1894684ac550f53891ea35b
institution Kabale University
issn 1939-1404
2151-1535
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
spelling doaj-art-28522c30e1894684ac550f53891ea35b2025-01-25T00:00:05ZengIEEEIEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing1939-14042151-15352025-01-01183905392010.1109/JSTARS.2025.352721310833730LMF-Net: A Learnable Multimodal Fusion Network for Semantic Segmentation of Remote Sensing DataJihao Li0https://orcid.org/0000-0002-8277-4223Wenkai Zhang1https://orcid.org/0000-0002-8903-2708Weihang Zhang2https://orcid.org/0009-0005-1171-1734Ruixue Zhou3Chongyang Li4https://orcid.org/0009-0003-8234-4420Boyuan Tong5https://orcid.org/0009-0000-5100-5918Xian Sun6https://orcid.org/0000-0002-0038-9816Kun Fu7Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing, ChinaAerospace Information Research Institute, Chinese Academy of Sciences, Beijing, ChinaAerospace Information Research Institute, Chinese Academy of Sciences, Beijing, ChinaAerospace Information Research Institute, Chinese Academy of Sciences, Beijing, ChinaAerospace Information Research Institute, Chinese Academy of Sciences, Beijing, ChinaAerospace Information Research Institute, Chinese Academy of Sciences, Beijing, ChinaAerospace Information Research Institute, Chinese Academy of Sciences, Beijing, ChinaAerospace Information Research Institute, Chinese Academy of Sciences, Beijing, ChinaSemantic segmentation of remote sensing images has produced a significant effect on many applications, such as land cover, land use, and smoke detection. With the ever-growing remote sensing data, fusing multimodal data from different sensors is a feasible and effective scheme for semantic segmentation task. Deep learning technology has prominently promoted the development of semantic segmentation. However, the majority of current approaches commonly focus more on feature mixing and construct relatively complex architectures. The further mining for cross-modal features is comparatively insufficient in heterogeneous data fusion. In addition, complex structures also lead to relatively heavy computation burden. Therefore, in this article, we propose an end-to-end learnable multimodal fusion network (LMF-Net) for remote sensing semantic segmentation. Concretely, we first develop a multiscale pooling fusion module by leveraging pooling operator. It provides key-value pairs with multimodal complementary information in a parameter-free manner and assigns them to self-attention (SA) layers of different modal branches. Then, to further harness the cross-modal collaborative embeddings/features, we elaborate two learnable fusion modules, learnable embedding fusion and learnable feature fusion. They are able to dynamically adjust the collaborative relationships of different modal embeddings and features in a learnable approach, respectively. Experiments on two well-established benchmark datasets reveal that our LMF-Net possesses superior segmentation behavior and strong generalization capability. In terms of computation complexity, it achieves competitive performance as well. Ultimately, the contribution of each component involved in LMF-Net is evaluated and discussed in detail.https://ieeexplore.ieee.org/document/10833730/Deep learning (DL)learnable fusionmultimodal dataremote sensingsemantic segmentation
spellingShingle Jihao Li
Wenkai Zhang
Weihang Zhang
Ruixue Zhou
Chongyang Li
Boyuan Tong
Xian Sun
Kun Fu
LMF-Net: A Learnable Multimodal Fusion Network for Semantic Segmentation of Remote Sensing Data
IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
Deep learning (DL)
learnable fusion
multimodal data
remote sensing
semantic segmentation
title LMF-Net: A Learnable Multimodal Fusion Network for Semantic Segmentation of Remote Sensing Data
title_full LMF-Net: A Learnable Multimodal Fusion Network for Semantic Segmentation of Remote Sensing Data
title_fullStr LMF-Net: A Learnable Multimodal Fusion Network for Semantic Segmentation of Remote Sensing Data
title_full_unstemmed LMF-Net: A Learnable Multimodal Fusion Network for Semantic Segmentation of Remote Sensing Data
title_short LMF-Net: A Learnable Multimodal Fusion Network for Semantic Segmentation of Remote Sensing Data
title_sort lmf net a learnable multimodal fusion network for semantic segmentation of remote sensing data
topic Deep learning (DL)
learnable fusion
multimodal data
remote sensing
semantic segmentation
url https://ieeexplore.ieee.org/document/10833730/
work_keys_str_mv AT jihaoli lmfnetalearnablemultimodalfusionnetworkforsemanticsegmentationofremotesensingdata
AT wenkaizhang lmfnetalearnablemultimodalfusionnetworkforsemanticsegmentationofremotesensingdata
AT weihangzhang lmfnetalearnablemultimodalfusionnetworkforsemanticsegmentationofremotesensingdata
AT ruixuezhou lmfnetalearnablemultimodalfusionnetworkforsemanticsegmentationofremotesensingdata
AT chongyangli lmfnetalearnablemultimodalfusionnetworkforsemanticsegmentationofremotesensingdata
AT boyuantong lmfnetalearnablemultimodalfusionnetworkforsemanticsegmentationofremotesensingdata
AT xiansun lmfnetalearnablemultimodalfusionnetworkforsemanticsegmentationofremotesensingdata
AT kunfu lmfnetalearnablemultimodalfusionnetworkforsemanticsegmentationofremotesensingdata