Learning From Detailed Maps: Joint 2D-3D Semantic Segmentation for Airborne Data with Selective Label Fusion

Objects for topographic maps are often extracted manually by interpreting and segmenting airborne data, such as 2D images and 3D point clouds. Deep learning (DL) with semantic segmentation can automate this process using existing maps as ground labels. However, current map-based DL methods are limit...

Full description

Saved in:
Bibliographic Details
Main Authors: G. Anjanappa, S. Oude Elberink, G. Vosselman
Format: Article
Language:English
Published: Copernicus Publications 2025-07-01
Series:ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences
Online Access:https://isprs-annals.copernicus.org/articles/X-G-2025/101/2025/isprs-annals-X-G-2025-101-2025.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Objects for topographic maps are often extracted manually by interpreting and segmenting airborne data, such as 2D images and 3D point clouds. Deep learning (DL) with semantic segmentation can automate this process using existing maps as ground labels. However, current map-based DL methods are limited to either 2D or 3D, focus on urban regions, segment only a few generic classes, and overlook the effects of abstractions in map-derived labels. To overcome these limitations, we propose a segmentation method that uses maps as ground truth with (i) joint 2D and 3D networks using multi-scale feature learning to capture fine details and segment diverse objects and (ii) a Selective Label Fusion module to refine predictions across both modalities, addressing the effects of map abstractions. Trained and tested in urban, rural, and forested regions, our method segments 11 map-based classes in 2D and 12 classes in 3D. At the class level, we achieve a mean Intersection over Union (mIoU) of 70% for both 2D and 3D, with label fusion improving 3D performance by 15% over non-fused results. Regionally, 4 out of 5 areas achieve mIoU above 60% in both modalities. These results demonstrate the potential of maps and DL to automate the labeling of images and point clouds, helping to create and update maps while also generating valuable labeled datasets for other computer vision tasks.
ISSN:2194-9042
2194-9050