Height-Adaptive Deformable Multi-Modal Fusion for 3D Object Detection

LiDAR-Camera fusion has demonstrated remarkable potential in 3D object detection for autonomous vehicles, leveraging complementary information from both modalities. Recent state-of-the-art approaches primarily make use of projection matrices to achieve cross-modal data alignment. However, these meth...

Full description

Saved in:
Bibliographic Details
Main Authors: Jiahao Li, Lingshan Chen, Zhen Li
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10935618/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:LiDAR-Camera fusion has demonstrated remarkable potential in 3D object detection for autonomous vehicles, leveraging complementary information from both modalities. Recent state-of-the-art approaches primarily make use of projection matrices to achieve cross-modal data alignment. However, these methods often struggle with poor performance when faced with sensor misalignment or calibration errors, resulting in suboptimal fusion quality and limited robustness. In this paper, we propose a novel framework for 3D object detection, called Height-Adaptive Deformable Multi-Modal Fusion, which leverages Deformable Attention to enhance the fusion process. Specifically, we introduce a Deformable-based Cross-Modal Spatial Attention that dynamically fuse image features through learnable offsets, allowing for more flexible and precise alignment between the LiDAR and camera modalities. To further improve the fusion quality, we design a Height-Adaptive Aggregation strategy that mitigates the risk of incorrect fusion from background points while emphasizing the aggregation of foreground object features. In addition, we introduce projection noise to simulate misalign scenarios. To tackle these issues, an extra supervision loss is added. Extensive experiments on the nuScenes benchmark demonstrate the effectiveness and robustness of our proposed framework. Specifically, our methods significantly outperforms the LiDAR-only method and exhibits reduced precision degradation under sensor misalignment, outperforming other fusion-based approaches. Our results validate the potential of proposed framework for improving 3D object detection accuracy, particularly in real-world, imperfect sensor environments.
ISSN:2169-3536