Cross-Supervised LiDAR-Camera Fusion for 3D Object Detection

Fusing LiDAR and camera information is essential for achieving accurate and reliable 3D object detection in autonomous driving systems. Due to inherent differences between different modalities, seeking an efficient and accurate fusion method is of great importance. Recently, significant progress has...

Full description

Saved in:

Bibliographic Details
Main Authors:	Chao Jie Zuo, Cao Yu Gu, Yi Kun Guo, Xiao Dong Miao
Format:	Article
Language:	English
Published:	IEEE 2025-01-01
Series:	IEEE Access
Subjects:	3D object detection LiDAR-camera system multi-sensor fusion BEV
Online Access:	https://ieeexplore.ieee.org/document/10804146/
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832592968194719744
author	Chao Jie Zuo Cao Yu Gu Yi Kun Guo Xiao Dong Miao
author_facet	Chao Jie Zuo Cao Yu Gu Yi Kun Guo Xiao Dong Miao
author_sort	Chao Jie Zuo
collection	DOAJ
description	Fusing LiDAR and camera information is essential for achieving accurate and reliable 3D object detection in autonomous driving systems. Due to inherent differences between different modalities, seeking an efficient and accurate fusion method is of great importance. Recently, significant progress has been made in 3D object detection methods based on lift-splat-shot (LSS-based) approaches. However, inaccurate depth estimation and substantial semantic information loss remain significant factors limiting the accuracy of 3D detection. In this paper, we propose a cross-fusion framework under a dual spatial representation, by integrating information in different spatial representations, namely bird’s-eye view (BEV) and camera view, and establishing soft links to fully utilize the information carried by different modalities. It consists of two important components, gated LiDAR supervised BEV (GLS-BEV) and multi-attention cross fusion (MACF) modules. The former achieves accurate depth estimation by supervising the transformation of LiDAR data with clear depth into the image space, constructing point cloud features in vehicle’s perspective. The latter utilizes three sub-attention modules with different roles to achieve cross-modal interaction within the same space, effectively reducing semantic loss. On the nuScenes benchmark, our proposed method achieves outstanding 3D object detection results with 71.8 mAP and 74.2 NDS. The code is available at <uri>https://github.com/zcj223311/CSDSFusion</uri>.
format	Article
id	doaj-art-e7b4fed1b55749f39093c72a243f25ee
institution	Kabale University
issn	2169-3536
language	English
publishDate	2025-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj-art-e7b4fed1b55749f39093c72a243f25ee2025-01-21T00:01:19ZengIEEEIEEE Access2169-35362025-01-0113104471045810.1109/ACCESS.2024.351856410804146Cross-Supervised LiDAR-Camera Fusion for 3D Object DetectionChao Jie Zuo0https://orcid.org/0009-0004-3997-9911Cao Yu Gu1https://orcid.org/0009-0000-0103-9562Yi Kun Guo2Xiao Dong Miao3https://orcid.org/0000-0002-5427-6550School of Mechanical and Power Engineering, Nanjing Tech University, Nanjing, ChinaSchool of Mechanical and Power Engineering, Nanjing Tech University, Nanjing, ChinaSchool of Mechanical and Power Engineering, Nanjing Tech University, Nanjing, ChinaSchool of Mechanical and Power Engineering, Nanjing Tech University, Nanjing, ChinaFusing LiDAR and camera information is essential for achieving accurate and reliable 3D object detection in autonomous driving systems. Due to inherent differences between different modalities, seeking an efficient and accurate fusion method is of great importance. Recently, significant progress has been made in 3D object detection methods based on lift-splat-shot (LSS-based) approaches. However, inaccurate depth estimation and substantial semantic information loss remain significant factors limiting the accuracy of 3D detection. In this paper, we propose a cross-fusion framework under a dual spatial representation, by integrating information in different spatial representations, namely bird’s-eye view (BEV) and camera view, and establishing soft links to fully utilize the information carried by different modalities. It consists of two important components, gated LiDAR supervised BEV (GLS-BEV) and multi-attention cross fusion (MACF) modules. The former achieves accurate depth estimation by supervising the transformation of LiDAR data with clear depth into the image space, constructing point cloud features in vehicle’s perspective. The latter utilizes three sub-attention modules with different roles to achieve cross-modal interaction within the same space, effectively reducing semantic loss. On the nuScenes benchmark, our proposed method achieves outstanding 3D object detection results with 71.8 mAP and 74.2 NDS. The code is available at <uri>https://github.com/zcj223311/CSDSFusion</uri>.https://ieeexplore.ieee.org/document/10804146/3D object detectionLiDAR-camera systemmulti-sensor fusionBEV
spellingShingle	Chao Jie Zuo Cao Yu Gu Yi Kun Guo Xiao Dong Miao Cross-Supervised LiDAR-Camera Fusion for 3D Object Detection IEEE Access 3D object detection LiDAR-camera system multi-sensor fusion BEV
title	Cross-Supervised LiDAR-Camera Fusion for 3D Object Detection
title_full	Cross-Supervised LiDAR-Camera Fusion for 3D Object Detection
title_fullStr	Cross-Supervised LiDAR-Camera Fusion for 3D Object Detection
title_full_unstemmed	Cross-Supervised LiDAR-Camera Fusion for 3D Object Detection
title_short	Cross-Supervised LiDAR-Camera Fusion for 3D Object Detection
title_sort	cross supervised lidar camera fusion for 3d object detection
topic	3D object detection LiDAR-camera system multi-sensor fusion BEV
url	https://ieeexplore.ieee.org/document/10804146/
work_keys_str_mv	AT chaojiezuo crosssupervisedlidarcamerafusionfor3dobjectdetection AT caoyugu crosssupervisedlidarcamerafusionfor3dobjectdetection AT yikunguo crosssupervisedlidarcamerafusionfor3dobjectdetection AT xiaodongmiao crosssupervisedlidarcamerafusionfor3dobjectdetection

Cross-Supervised LiDAR-Camera Fusion for 3D Object Detection

Similar Items