The TDGL Module: A Fast Multi-Scale Vision Sensor Based on a Transformation Dilated Grouped Layer
Effectively capturing multi-scale object features is crucial for vision sensors used in road object detection tasks. Traditional spatial pyramid pooling methods fuse multi-scale feature information but lack adaptability in dynamically adjusting convolution operations based on their actual needs. Thi...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-05-01
|
| Series: | Sensors |
| Subjects: | |
| Online Access: | https://www.mdpi.com/1424-8220/25/11/3339 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Effectively capturing multi-scale object features is crucial for vision sensors used in road object detection tasks. Traditional spatial pyramid pooling methods fuse multi-scale feature information but lack adaptability in dynamically adjusting convolution operations based on their actual needs. This limitation prevents them from fully utilizing spatial hierarchies and contextual information. To address this challenge, we propose a Transformation Dilated Grouped Layer (TDGL) module, a fast multi-scale vision sensor based on deep learning, designed to enhance both efficiency and accuracy in road target feature extraction networks. The TDGL is built upon the Global Layer Normalization Convolution (GLConv) unit, which mitigates internal covariate shift by introducing scaling and offset parameters, modifying dilation strategies, and employing grouped convolution. These improvements enable the network to distinguish features at different scales effectively while optimizing spatial information processing and reducing computational costs. To validate its effectiveness, we integrate the TDGL module into the backbone of several YOLO models, forming the TDGL Net feature extractor. The experimental results obtained on the BDD100K dataset show that the mAP of the TDGL net reaches 40.3% with around 3.1M parameters. The inference speed of the TDGL net after transformation optimization reaches 58 FPS, which meets the requirement for the real-time detection of road obstacle targets by autonomous vehicles. |
|---|---|
| ISSN: | 1424-8220 |