LMF-Net: A Learnable Multimodal Fusion Network for Semantic Segmentation of Remote Sensing Data
Semantic segmentation of remote sensing images has produced a significant effect on many applications, such as land cover, land use, and smoke detection. With the ever-growing remote sensing data, fusing multimodal data from different sensors is a feasible and effective scheme for semantic segmentat...
Saved in:
Main Authors: | , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2025-01-01
|
Series: | IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/10833730/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832586845790142464 |
---|---|
author | Jihao Li Wenkai Zhang Weihang Zhang Ruixue Zhou Chongyang Li Boyuan Tong Xian Sun Kun Fu |
author_facet | Jihao Li Wenkai Zhang Weihang Zhang Ruixue Zhou Chongyang Li Boyuan Tong Xian Sun Kun Fu |
author_sort | Jihao Li |
collection | DOAJ |
description | Semantic segmentation of remote sensing images has produced a significant effect on many applications, such as land cover, land use, and smoke detection. With the ever-growing remote sensing data, fusing multimodal data from different sensors is a feasible and effective scheme for semantic segmentation task. Deep learning technology has prominently promoted the development of semantic segmentation. However, the majority of current approaches commonly focus more on feature mixing and construct relatively complex architectures. The further mining for cross-modal features is comparatively insufficient in heterogeneous data fusion. In addition, complex structures also lead to relatively heavy computation burden. Therefore, in this article, we propose an end-to-end learnable multimodal fusion network (LMF-Net) for remote sensing semantic segmentation. Concretely, we first develop a multiscale pooling fusion module by leveraging pooling operator. It provides key-value pairs with multimodal complementary information in a parameter-free manner and assigns them to self-attention (SA) layers of different modal branches. Then, to further harness the cross-modal collaborative embeddings/features, we elaborate two learnable fusion modules, learnable embedding fusion and learnable feature fusion. They are able to dynamically adjust the collaborative relationships of different modal embeddings and features in a learnable approach, respectively. Experiments on two well-established benchmark datasets reveal that our LMF-Net possesses superior segmentation behavior and strong generalization capability. In terms of computation complexity, it achieves competitive performance as well. Ultimately, the contribution of each component involved in LMF-Net is evaluated and discussed in detail. |
format | Article |
id | doaj-art-28522c30e1894684ac550f53891ea35b |
institution | Kabale University |
issn | 1939-1404 2151-1535 |
language | English |
publishDate | 2025-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing |
spelling | doaj-art-28522c30e1894684ac550f53891ea35b2025-01-25T00:00:05ZengIEEEIEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing1939-14042151-15352025-01-01183905392010.1109/JSTARS.2025.352721310833730LMF-Net: A Learnable Multimodal Fusion Network for Semantic Segmentation of Remote Sensing DataJihao Li0https://orcid.org/0000-0002-8277-4223Wenkai Zhang1https://orcid.org/0000-0002-8903-2708Weihang Zhang2https://orcid.org/0009-0005-1171-1734Ruixue Zhou3Chongyang Li4https://orcid.org/0009-0003-8234-4420Boyuan Tong5https://orcid.org/0009-0000-5100-5918Xian Sun6https://orcid.org/0000-0002-0038-9816Kun Fu7Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing, ChinaAerospace Information Research Institute, Chinese Academy of Sciences, Beijing, ChinaAerospace Information Research Institute, Chinese Academy of Sciences, Beijing, ChinaAerospace Information Research Institute, Chinese Academy of Sciences, Beijing, ChinaAerospace Information Research Institute, Chinese Academy of Sciences, Beijing, ChinaAerospace Information Research Institute, Chinese Academy of Sciences, Beijing, ChinaAerospace Information Research Institute, Chinese Academy of Sciences, Beijing, ChinaAerospace Information Research Institute, Chinese Academy of Sciences, Beijing, ChinaSemantic segmentation of remote sensing images has produced a significant effect on many applications, such as land cover, land use, and smoke detection. With the ever-growing remote sensing data, fusing multimodal data from different sensors is a feasible and effective scheme for semantic segmentation task. Deep learning technology has prominently promoted the development of semantic segmentation. However, the majority of current approaches commonly focus more on feature mixing and construct relatively complex architectures. The further mining for cross-modal features is comparatively insufficient in heterogeneous data fusion. In addition, complex structures also lead to relatively heavy computation burden. Therefore, in this article, we propose an end-to-end learnable multimodal fusion network (LMF-Net) for remote sensing semantic segmentation. Concretely, we first develop a multiscale pooling fusion module by leveraging pooling operator. It provides key-value pairs with multimodal complementary information in a parameter-free manner and assigns them to self-attention (SA) layers of different modal branches. Then, to further harness the cross-modal collaborative embeddings/features, we elaborate two learnable fusion modules, learnable embedding fusion and learnable feature fusion. They are able to dynamically adjust the collaborative relationships of different modal embeddings and features in a learnable approach, respectively. Experiments on two well-established benchmark datasets reveal that our LMF-Net possesses superior segmentation behavior and strong generalization capability. In terms of computation complexity, it achieves competitive performance as well. Ultimately, the contribution of each component involved in LMF-Net is evaluated and discussed in detail.https://ieeexplore.ieee.org/document/10833730/Deep learning (DL)learnable fusionmultimodal dataremote sensingsemantic segmentation |
spellingShingle | Jihao Li Wenkai Zhang Weihang Zhang Ruixue Zhou Chongyang Li Boyuan Tong Xian Sun Kun Fu LMF-Net: A Learnable Multimodal Fusion Network for Semantic Segmentation of Remote Sensing Data IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing Deep learning (DL) learnable fusion multimodal data remote sensing semantic segmentation |
title | LMF-Net: A Learnable Multimodal Fusion Network for Semantic Segmentation of Remote Sensing Data |
title_full | LMF-Net: A Learnable Multimodal Fusion Network for Semantic Segmentation of Remote Sensing Data |
title_fullStr | LMF-Net: A Learnable Multimodal Fusion Network for Semantic Segmentation of Remote Sensing Data |
title_full_unstemmed | LMF-Net: A Learnable Multimodal Fusion Network for Semantic Segmentation of Remote Sensing Data |
title_short | LMF-Net: A Learnable Multimodal Fusion Network for Semantic Segmentation of Remote Sensing Data |
title_sort | lmf net a learnable multimodal fusion network for semantic segmentation of remote sensing data |
topic | Deep learning (DL) learnable fusion multimodal data remote sensing semantic segmentation |
url | https://ieeexplore.ieee.org/document/10833730/ |
work_keys_str_mv | AT jihaoli lmfnetalearnablemultimodalfusionnetworkforsemanticsegmentationofremotesensingdata AT wenkaizhang lmfnetalearnablemultimodalfusionnetworkforsemanticsegmentationofremotesensingdata AT weihangzhang lmfnetalearnablemultimodalfusionnetworkforsemanticsegmentationofremotesensingdata AT ruixuezhou lmfnetalearnablemultimodalfusionnetworkforsemanticsegmentationofremotesensingdata AT chongyangli lmfnetalearnablemultimodalfusionnetworkforsemanticsegmentationofremotesensingdata AT boyuantong lmfnetalearnablemultimodalfusionnetworkforsemanticsegmentationofremotesensingdata AT xiansun lmfnetalearnablemultimodalfusionnetworkforsemanticsegmentationofremotesensingdata AT kunfu lmfnetalearnablemultimodalfusionnetworkforsemanticsegmentationofremotesensingdata |