Hybrid Offset Position Encoding for Large-Scale Point Cloud Semantic Segmentation
In recent years, large-scale point cloud semantic segmentation has been widely applied in various fields, such as remote sensing and autonomous driving. Most existing point cloud networks use local aggregation to abstract unordered point clouds layer by layer. Among these, position embedding serves...
Saved in:
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2025-01-01
|
Series: | Remote Sensing |
Subjects: | |
Online Access: | https://www.mdpi.com/2072-4292/17/2/256 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832587563394662400 |
---|---|
author | Yu Xiao Hui Wu Yisheng Chen Chongcheng Chen Ruihai Dong Ding Lin |
author_facet | Yu Xiao Hui Wu Yisheng Chen Chongcheng Chen Ruihai Dong Ding Lin |
author_sort | Yu Xiao |
collection | DOAJ |
description | In recent years, large-scale point cloud semantic segmentation has been widely applied in various fields, such as remote sensing and autonomous driving. Most existing point cloud networks use local aggregation to abstract unordered point clouds layer by layer. Among these, position embedding serves as a crucial step. However, current methods of position embedding have limitations in modeling spatial relationships, especially in deeper encoders where richer spatial positional relationships are needed. To address these issues, this paper summarizes the advantages and disadvantages of mainstream position embedding methods and proposes a novel Hybrid Offset Position Encoding (HOPE) module. This module comprises two branches that compute relative positional encoding (RPE) and offset positional encoding (OPE). RPE combines explicit encoding to enhance position features through attention, learning position bias implicitly, while OPE calculates absolute position offset encoding by considering differences with grouping embeddings. These two encodings are adaptively mixed in the final output. The experiment conducted on multiple datasets demonstrates that our module helps the deep encoders of the network capture more robust features, thereby improving model performance on various baseline models. For instance, PointNet++ and PointMetaBase enhanced with HOPE achieved mIoU gains of 2.1% and 1.3% on the large-scale indoor dataset S3DIS area-5, 2.5% and 1.1% on S3DIS 6-fold, and 1.5% and 0.6% on ScanNet, respectively. RandLA-Net with HOPE achieved a 1.4% improvement on the large-scale outdoor dataset Toronto3D, all with minimal additional computational cost. PointNet++ and PointMetaBase had approximately only a 0.1 M parameter increase. This module can serve as an alternative for position embedding, and is suitable for point-based networks requiring local aggregation. |
format | Article |
id | doaj-art-42fffeb8febd4109b2497250f0ad2fd3 |
institution | Kabale University |
issn | 2072-4292 |
language | English |
publishDate | 2025-01-01 |
publisher | MDPI AG |
record_format | Article |
series | Remote Sensing |
spelling | doaj-art-42fffeb8febd4109b2497250f0ad2fd32025-01-24T13:47:54ZengMDPI AGRemote Sensing2072-42922025-01-0117225610.3390/rs17020256Hybrid Offset Position Encoding for Large-Scale Point Cloud Semantic SegmentationYu Xiao0Hui Wu1Yisheng Chen2Chongcheng Chen3Ruihai Dong4Ding Lin5The Academy of Digital China, Fuzhou University, Fuzhou 350108, ChinaThe Academy of Digital China, Fuzhou University, Fuzhou 350108, ChinaThe Academy of Digital China, Fuzhou University, Fuzhou 350108, ChinaThe Academy of Digital China, Fuzhou University, Fuzhou 350108, ChinaThe School of Computer Science, University College Dublin, D04 V1W8 Dublin, IrelandThe Academy of Digital China, Fuzhou University, Fuzhou 350108, ChinaIn recent years, large-scale point cloud semantic segmentation has been widely applied in various fields, such as remote sensing and autonomous driving. Most existing point cloud networks use local aggregation to abstract unordered point clouds layer by layer. Among these, position embedding serves as a crucial step. However, current methods of position embedding have limitations in modeling spatial relationships, especially in deeper encoders where richer spatial positional relationships are needed. To address these issues, this paper summarizes the advantages and disadvantages of mainstream position embedding methods and proposes a novel Hybrid Offset Position Encoding (HOPE) module. This module comprises two branches that compute relative positional encoding (RPE) and offset positional encoding (OPE). RPE combines explicit encoding to enhance position features through attention, learning position bias implicitly, while OPE calculates absolute position offset encoding by considering differences with grouping embeddings. These two encodings are adaptively mixed in the final output. The experiment conducted on multiple datasets demonstrates that our module helps the deep encoders of the network capture more robust features, thereby improving model performance on various baseline models. For instance, PointNet++ and PointMetaBase enhanced with HOPE achieved mIoU gains of 2.1% and 1.3% on the large-scale indoor dataset S3DIS area-5, 2.5% and 1.1% on S3DIS 6-fold, and 1.5% and 0.6% on ScanNet, respectively. RandLA-Net with HOPE achieved a 1.4% improvement on the large-scale outdoor dataset Toronto3D, all with minimal additional computational cost. PointNet++ and PointMetaBase had approximately only a 0.1 M parameter increase. This module can serve as an alternative for position embedding, and is suitable for point-based networks requiring local aggregation.https://www.mdpi.com/2072-4292/17/2/256positional encodingposition embeddinglocal aggregationattention mechanismlarge-scale point cloudsemantic segmentation |
spellingShingle | Yu Xiao Hui Wu Yisheng Chen Chongcheng Chen Ruihai Dong Ding Lin Hybrid Offset Position Encoding for Large-Scale Point Cloud Semantic Segmentation Remote Sensing positional encoding position embedding local aggregation attention mechanism large-scale point cloud semantic segmentation |
title | Hybrid Offset Position Encoding for Large-Scale Point Cloud Semantic Segmentation |
title_full | Hybrid Offset Position Encoding for Large-Scale Point Cloud Semantic Segmentation |
title_fullStr | Hybrid Offset Position Encoding for Large-Scale Point Cloud Semantic Segmentation |
title_full_unstemmed | Hybrid Offset Position Encoding for Large-Scale Point Cloud Semantic Segmentation |
title_short | Hybrid Offset Position Encoding for Large-Scale Point Cloud Semantic Segmentation |
title_sort | hybrid offset position encoding for large scale point cloud semantic segmentation |
topic | positional encoding position embedding local aggregation attention mechanism large-scale point cloud semantic segmentation |
url | https://www.mdpi.com/2072-4292/17/2/256 |
work_keys_str_mv | AT yuxiao hybridoffsetpositionencodingforlargescalepointcloudsemanticsegmentation AT huiwu hybridoffsetpositionencodingforlargescalepointcloudsemanticsegmentation AT yishengchen hybridoffsetpositionencodingforlargescalepointcloudsemanticsegmentation AT chongchengchen hybridoffsetpositionencodingforlargescalepointcloudsemanticsegmentation AT ruihaidong hybridoffsetpositionencodingforlargescalepointcloudsemanticsegmentation AT dinglin hybridoffsetpositionencodingforlargescalepointcloudsemanticsegmentation |