AlignFusionNet: Efficient Cross-Modal Alignment and Fusion for 3D Semantic Occupancy Prediction
The environmental perception system is a critical component of autonomous vehicles, and multimodal perception systems significantly enhance perception capabilities by integrating camera and LiDAR data. This paper proposes a novel framework, AlignFusionNet. It effectively combines image and point clo...
Saved in:
| Main Authors: | , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/11082274/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849714500924604416 |
|---|---|
| author | Ziyi Xu Legan Qi Hongzhou Du Jiaqi Yang Zhenglin Chen |
| author_facet | Ziyi Xu Legan Qi Hongzhou Du Jiaqi Yang Zhenglin Chen |
| author_sort | Ziyi Xu |
| collection | DOAJ |
| description | The environmental perception system is a critical component of autonomous vehicles, and multimodal perception systems significantly enhance perception capabilities by integrating camera and LiDAR data. This paper proposes a novel framework, AlignFusionNet. It effectively combines image and point cloud data to construct an occupancy network, thereby improving target detection and representation. The framework introduces two innovative modules: a point-level data alignment module based on geometric transformations and an enhanced fusion module utilizing cross-attention mechanisms. These modules achieve precise point-level alignment and seamless feature fusion between point clouds and RGB images. Experiments on the nuScenes-Occupancy dataset demonstrate that the proposed AlignFusionNet outperforms baseline methods, achieving a significant 15.9% improvement in mIoU and a 4% increase in IoU. Compared to the previous state-of-the-art method, OccGen, mIoU is improved by 5.9%. Further qualitative visualization analysis shows that the proposed method achieves higher representation accuracy for small objects. |
| format | Article |
| id | doaj-art-5d5ea1ab61a441aeaa45c3b9cf435201 |
| institution | DOAJ |
| issn | 2169-3536 |
| language | English |
| publishDate | 2025-01-01 |
| publisher | IEEE |
| record_format | Article |
| series | IEEE Access |
| spelling | doaj-art-5d5ea1ab61a441aeaa45c3b9cf4352012025-08-20T03:13:42ZengIEEEIEEE Access2169-35362025-01-011312500312501510.1109/ACCESS.2025.358985811082274AlignFusionNet: Efficient Cross-Modal Alignment and Fusion for 3D Semantic Occupancy PredictionZiyi Xu0https://orcid.org/0009-0008-1548-9642Legan Qi1Hongzhou Du2Jiaqi Yang3Zhenglin Chen4https://orcid.org/0009-0007-9229-9693Department of Electrical and Computer Engineering, University of Macau, Macau, ChinaDepartment of Electrical and Computer Engineering, University of Macau, Macau, ChinaDepartment of Electrical and Computer Engineering, University of Macau, Macau, ChinaZhejiang Key Laboratory of Imaging and Interventional Medicine, Zhejiang Engineering Research Center of Interventional Medicine Engineering and Biotechnology, The Fifth Affiliated Hospital of Wenzhou Medical University, Lishui, ChinaZhejiang Key Laboratory of Imaging and Interventional Medicine, Zhejiang Engineering Research Center of Interventional Medicine Engineering and Biotechnology, The Fifth Affiliated Hospital of Wenzhou Medical University, Lishui, ChinaThe environmental perception system is a critical component of autonomous vehicles, and multimodal perception systems significantly enhance perception capabilities by integrating camera and LiDAR data. This paper proposes a novel framework, AlignFusionNet. It effectively combines image and point cloud data to construct an occupancy network, thereby improving target detection and representation. The framework introduces two innovative modules: a point-level data alignment module based on geometric transformations and an enhanced fusion module utilizing cross-attention mechanisms. These modules achieve precise point-level alignment and seamless feature fusion between point clouds and RGB images. Experiments on the nuScenes-Occupancy dataset demonstrate that the proposed AlignFusionNet outperforms baseline methods, achieving a significant 15.9% improvement in mIoU and a 4% increase in IoU. Compared to the previous state-of-the-art method, OccGen, mIoU is improved by 5.9%. Further qualitative visualization analysis shows that the proposed method achieves higher representation accuracy for small objects.https://ieeexplore.ieee.org/document/11082274/3D occupancy predictionpoint cloudmulti-view imagemultimodal feature alignmentcross-attention mechanisms |
| spellingShingle | Ziyi Xu Legan Qi Hongzhou Du Jiaqi Yang Zhenglin Chen AlignFusionNet: Efficient Cross-Modal Alignment and Fusion for 3D Semantic Occupancy Prediction IEEE Access 3D occupancy prediction point cloud multi-view image multimodal feature alignment cross-attention mechanisms |
| title | AlignFusionNet: Efficient Cross-Modal Alignment and Fusion for 3D Semantic Occupancy Prediction |
| title_full | AlignFusionNet: Efficient Cross-Modal Alignment and Fusion for 3D Semantic Occupancy Prediction |
| title_fullStr | AlignFusionNet: Efficient Cross-Modal Alignment and Fusion for 3D Semantic Occupancy Prediction |
| title_full_unstemmed | AlignFusionNet: Efficient Cross-Modal Alignment and Fusion for 3D Semantic Occupancy Prediction |
| title_short | AlignFusionNet: Efficient Cross-Modal Alignment and Fusion for 3D Semantic Occupancy Prediction |
| title_sort | alignfusionnet efficient cross modal alignment and fusion for 3d semantic occupancy prediction |
| topic | 3D occupancy prediction point cloud multi-view image multimodal feature alignment cross-attention mechanisms |
| url | https://ieeexplore.ieee.org/document/11082274/ |
| work_keys_str_mv | AT ziyixu alignfusionnetefficientcrossmodalalignmentandfusionfor3dsemanticoccupancyprediction AT leganqi alignfusionnetefficientcrossmodalalignmentandfusionfor3dsemanticoccupancyprediction AT hongzhoudu alignfusionnetefficientcrossmodalalignmentandfusionfor3dsemanticoccupancyprediction AT jiaqiyang alignfusionnetefficientcrossmodalalignmentandfusionfor3dsemanticoccupancyprediction AT zhenglinchen alignfusionnetefficientcrossmodalalignmentandfusionfor3dsemanticoccupancyprediction |