Hybrid Offset Position Encoding for Large-Scale Point Cloud Semantic Segmentation

In recent years, large-scale point cloud semantic segmentation has been widely applied in various fields, such as remote sensing and autonomous driving. Most existing point cloud networks use local aggregation to abstract unordered point clouds layer by layer. Among these, position embedding serves...

Full description

Saved in:

Bibliographic Details
Main Authors:	Yu Xiao, Hui Wu, Yisheng Chen, Chongcheng Chen, Ruihai Dong, Ding Lin
Format:	Article
Language:	English
Published:	MDPI AG 2025-01-01
Series:	Remote Sensing
Subjects:	positional encoding position embedding local aggregation attention mechanism large-scale point cloud semantic segmentation
Online Access:	https://www.mdpi.com/2072-4292/17/2/256
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832587563394662400
author	Yu Xiao Hui Wu Yisheng Chen Chongcheng Chen Ruihai Dong Ding Lin
author_facet	Yu Xiao Hui Wu Yisheng Chen Chongcheng Chen Ruihai Dong Ding Lin
author_sort	Yu Xiao
collection	DOAJ
description	In recent years, large-scale point cloud semantic segmentation has been widely applied in various fields, such as remote sensing and autonomous driving. Most existing point cloud networks use local aggregation to abstract unordered point clouds layer by layer. Among these, position embedding serves as a crucial step. However, current methods of position embedding have limitations in modeling spatial relationships, especially in deeper encoders where richer spatial positional relationships are needed. To address these issues, this paper summarizes the advantages and disadvantages of mainstream position embedding methods and proposes a novel Hybrid Offset Position Encoding (HOPE) module. This module comprises two branches that compute relative positional encoding (RPE) and offset positional encoding (OPE). RPE combines explicit encoding to enhance position features through attention, learning position bias implicitly, while OPE calculates absolute position offset encoding by considering differences with grouping embeddings. These two encodings are adaptively mixed in the final output. The experiment conducted on multiple datasets demonstrates that our module helps the deep encoders of the network capture more robust features, thereby improving model performance on various baseline models. For instance, PointNet++ and PointMetaBase enhanced with HOPE achieved mIoU gains of 2.1% and 1.3% on the large-scale indoor dataset S3DIS area-5, 2.5% and 1.1% on S3DIS 6-fold, and 1.5% and 0.6% on ScanNet, respectively. RandLA-Net with HOPE achieved a 1.4% improvement on the large-scale outdoor dataset Toronto3D, all with minimal additional computational cost. PointNet++ and PointMetaBase had approximately only a 0.1 M parameter increase. This module can serve as an alternative for position embedding, and is suitable for point-based networks requiring local aggregation.
format	Article
id	doaj-art-42fffeb8febd4109b2497250f0ad2fd3
institution	Kabale University
issn	2072-4292
language	English
publishDate	2025-01-01
publisher	MDPI AG
record_format	Article
series	Remote Sensing
spelling	doaj-art-42fffeb8febd4109b2497250f0ad2fd32025-01-24T13:47:54ZengMDPI AGRemote Sensing2072-42922025-01-0117225610.3390/rs17020256Hybrid Offset Position Encoding for Large-Scale Point Cloud Semantic SegmentationYu Xiao0Hui Wu1Yisheng Chen2Chongcheng Chen3Ruihai Dong4Ding Lin5The Academy of Digital China, Fuzhou University, Fuzhou 350108, ChinaThe Academy of Digital China, Fuzhou University, Fuzhou 350108, ChinaThe Academy of Digital China, Fuzhou University, Fuzhou 350108, ChinaThe Academy of Digital China, Fuzhou University, Fuzhou 350108, ChinaThe School of Computer Science, University College Dublin, D04 V1W8 Dublin, IrelandThe Academy of Digital China, Fuzhou University, Fuzhou 350108, ChinaIn recent years, large-scale point cloud semantic segmentation has been widely applied in various fields, such as remote sensing and autonomous driving. Most existing point cloud networks use local aggregation to abstract unordered point clouds layer by layer. Among these, position embedding serves as a crucial step. However, current methods of position embedding have limitations in modeling spatial relationships, especially in deeper encoders where richer spatial positional relationships are needed. To address these issues, this paper summarizes the advantages and disadvantages of mainstream position embedding methods and proposes a novel Hybrid Offset Position Encoding (HOPE) module. This module comprises two branches that compute relative positional encoding (RPE) and offset positional encoding (OPE). RPE combines explicit encoding to enhance position features through attention, learning position bias implicitly, while OPE calculates absolute position offset encoding by considering differences with grouping embeddings. These two encodings are adaptively mixed in the final output. The experiment conducted on multiple datasets demonstrates that our module helps the deep encoders of the network capture more robust features, thereby improving model performance on various baseline models. For instance, PointNet++ and PointMetaBase enhanced with HOPE achieved mIoU gains of 2.1% and 1.3% on the large-scale indoor dataset S3DIS area-5, 2.5% and 1.1% on S3DIS 6-fold, and 1.5% and 0.6% on ScanNet, respectively. RandLA-Net with HOPE achieved a 1.4% improvement on the large-scale outdoor dataset Toronto3D, all with minimal additional computational cost. PointNet++ and PointMetaBase had approximately only a 0.1 M parameter increase. This module can serve as an alternative for position embedding, and is suitable for point-based networks requiring local aggregation.https://www.mdpi.com/2072-4292/17/2/256positional encodingposition embeddinglocal aggregationattention mechanismlarge-scale point cloudsemantic segmentation
spellingShingle	Yu Xiao Hui Wu Yisheng Chen Chongcheng Chen Ruihai Dong Ding Lin Hybrid Offset Position Encoding for Large-Scale Point Cloud Semantic Segmentation Remote Sensing positional encoding position embedding local aggregation attention mechanism large-scale point cloud semantic segmentation
title	Hybrid Offset Position Encoding for Large-Scale Point Cloud Semantic Segmentation
title_full	Hybrid Offset Position Encoding for Large-Scale Point Cloud Semantic Segmentation
title_fullStr	Hybrid Offset Position Encoding for Large-Scale Point Cloud Semantic Segmentation
title_full_unstemmed	Hybrid Offset Position Encoding for Large-Scale Point Cloud Semantic Segmentation
title_short	Hybrid Offset Position Encoding for Large-Scale Point Cloud Semantic Segmentation
title_sort	hybrid offset position encoding for large scale point cloud semantic segmentation
topic	positional encoding position embedding local aggregation attention mechanism large-scale point cloud semantic segmentation
url	https://www.mdpi.com/2072-4292/17/2/256
work_keys_str_mv	AT yuxiao hybridoffsetpositionencodingforlargescalepointcloudsemanticsegmentation AT huiwu hybridoffsetpositionencodingforlargescalepointcloudsemanticsegmentation AT yishengchen hybridoffsetpositionencodingforlargescalepointcloudsemanticsegmentation AT chongchengchen hybridoffsetpositionencodingforlargescalepointcloudsemanticsegmentation AT ruihaidong hybridoffsetpositionencodingforlargescalepointcloudsemanticsegmentation AT dinglin hybridoffsetpositionencodingforlargescalepointcloudsemanticsegmentation

Hybrid Offset Position Encoding for Large-Scale Point Cloud Semantic Segmentation

Similar Items