Hybrid Offset Position Encoding for Large-Scale Point Cloud Semantic Segmentation

In recent years, large-scale point cloud semantic segmentation has been widely applied in various fields, such as remote sensing and autonomous driving. Most existing point cloud networks use local aggregation to abstract unordered point clouds layer by layer. Among these, position embedding serves...

Full description

Saved in:
Bibliographic Details
Main Authors: Yu Xiao, Hui Wu, Yisheng Chen, Chongcheng Chen, Ruihai Dong, Ding Lin
Format: Article
Language:English
Published: MDPI AG 2025-01-01
Series:Remote Sensing
Subjects:
Online Access:https://www.mdpi.com/2072-4292/17/2/256
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832587563394662400
author Yu Xiao
Hui Wu
Yisheng Chen
Chongcheng Chen
Ruihai Dong
Ding Lin
author_facet Yu Xiao
Hui Wu
Yisheng Chen
Chongcheng Chen
Ruihai Dong
Ding Lin
author_sort Yu Xiao
collection DOAJ
description In recent years, large-scale point cloud semantic segmentation has been widely applied in various fields, such as remote sensing and autonomous driving. Most existing point cloud networks use local aggregation to abstract unordered point clouds layer by layer. Among these, position embedding serves as a crucial step. However, current methods of position embedding have limitations in modeling spatial relationships, especially in deeper encoders where richer spatial positional relationships are needed. To address these issues, this paper summarizes the advantages and disadvantages of mainstream position embedding methods and proposes a novel Hybrid Offset Position Encoding (HOPE) module. This module comprises two branches that compute relative positional encoding (RPE) and offset positional encoding (OPE). RPE combines explicit encoding to enhance position features through attention, learning position bias implicitly, while OPE calculates absolute position offset encoding by considering differences with grouping embeddings. These two encodings are adaptively mixed in the final output. The experiment conducted on multiple datasets demonstrates that our module helps the deep encoders of the network capture more robust features, thereby improving model performance on various baseline models. For instance, PointNet++ and PointMetaBase enhanced with HOPE achieved mIoU gains of 2.1% and 1.3% on the large-scale indoor dataset S3DIS area-5, 2.5% and 1.1% on S3DIS 6-fold, and 1.5% and 0.6% on ScanNet, respectively. RandLA-Net with HOPE achieved a 1.4% improvement on the large-scale outdoor dataset Toronto3D, all with minimal additional computational cost. PointNet++ and PointMetaBase had approximately only a 0.1 M parameter increase. This module can serve as an alternative for position embedding, and is suitable for point-based networks requiring local aggregation.
format Article
id doaj-art-42fffeb8febd4109b2497250f0ad2fd3
institution Kabale University
issn 2072-4292
language English
publishDate 2025-01-01
publisher MDPI AG
record_format Article
series Remote Sensing
spelling doaj-art-42fffeb8febd4109b2497250f0ad2fd32025-01-24T13:47:54ZengMDPI AGRemote Sensing2072-42922025-01-0117225610.3390/rs17020256Hybrid Offset Position Encoding for Large-Scale Point Cloud Semantic SegmentationYu Xiao0Hui Wu1Yisheng Chen2Chongcheng Chen3Ruihai Dong4Ding Lin5The Academy of Digital China, Fuzhou University, Fuzhou 350108, ChinaThe Academy of Digital China, Fuzhou University, Fuzhou 350108, ChinaThe Academy of Digital China, Fuzhou University, Fuzhou 350108, ChinaThe Academy of Digital China, Fuzhou University, Fuzhou 350108, ChinaThe School of Computer Science, University College Dublin, D04 V1W8 Dublin, IrelandThe Academy of Digital China, Fuzhou University, Fuzhou 350108, ChinaIn recent years, large-scale point cloud semantic segmentation has been widely applied in various fields, such as remote sensing and autonomous driving. Most existing point cloud networks use local aggregation to abstract unordered point clouds layer by layer. Among these, position embedding serves as a crucial step. However, current methods of position embedding have limitations in modeling spatial relationships, especially in deeper encoders where richer spatial positional relationships are needed. To address these issues, this paper summarizes the advantages and disadvantages of mainstream position embedding methods and proposes a novel Hybrid Offset Position Encoding (HOPE) module. This module comprises two branches that compute relative positional encoding (RPE) and offset positional encoding (OPE). RPE combines explicit encoding to enhance position features through attention, learning position bias implicitly, while OPE calculates absolute position offset encoding by considering differences with grouping embeddings. These two encodings are adaptively mixed in the final output. The experiment conducted on multiple datasets demonstrates that our module helps the deep encoders of the network capture more robust features, thereby improving model performance on various baseline models. For instance, PointNet++ and PointMetaBase enhanced with HOPE achieved mIoU gains of 2.1% and 1.3% on the large-scale indoor dataset S3DIS area-5, 2.5% and 1.1% on S3DIS 6-fold, and 1.5% and 0.6% on ScanNet, respectively. RandLA-Net with HOPE achieved a 1.4% improvement on the large-scale outdoor dataset Toronto3D, all with minimal additional computational cost. PointNet++ and PointMetaBase had approximately only a 0.1 M parameter increase. This module can serve as an alternative for position embedding, and is suitable for point-based networks requiring local aggregation.https://www.mdpi.com/2072-4292/17/2/256positional encodingposition embeddinglocal aggregationattention mechanismlarge-scale point cloudsemantic segmentation
spellingShingle Yu Xiao
Hui Wu
Yisheng Chen
Chongcheng Chen
Ruihai Dong
Ding Lin
Hybrid Offset Position Encoding for Large-Scale Point Cloud Semantic Segmentation
Remote Sensing
positional encoding
position embedding
local aggregation
attention mechanism
large-scale point cloud
semantic segmentation
title Hybrid Offset Position Encoding for Large-Scale Point Cloud Semantic Segmentation
title_full Hybrid Offset Position Encoding for Large-Scale Point Cloud Semantic Segmentation
title_fullStr Hybrid Offset Position Encoding for Large-Scale Point Cloud Semantic Segmentation
title_full_unstemmed Hybrid Offset Position Encoding for Large-Scale Point Cloud Semantic Segmentation
title_short Hybrid Offset Position Encoding for Large-Scale Point Cloud Semantic Segmentation
title_sort hybrid offset position encoding for large scale point cloud semantic segmentation
topic positional encoding
position embedding
local aggregation
attention mechanism
large-scale point cloud
semantic segmentation
url https://www.mdpi.com/2072-4292/17/2/256
work_keys_str_mv AT yuxiao hybridoffsetpositionencodingforlargescalepointcloudsemanticsegmentation
AT huiwu hybridoffsetpositionencodingforlargescalepointcloudsemanticsegmentation
AT yishengchen hybridoffsetpositionencodingforlargescalepointcloudsemanticsegmentation
AT chongchengchen hybridoffsetpositionencodingforlargescalepointcloudsemanticsegmentation
AT ruihaidong hybridoffsetpositionencodingforlargescalepointcloudsemanticsegmentation
AT dinglin hybridoffsetpositionencodingforlargescalepointcloudsemanticsegmentation