LMSOE-Net: lightweight multi-scale small object enhancement network for UAV aerial images
Abstract Detecting objects of varying scales, especially small ones, in Unmanned Aerial Vehicle (UAV) aerial images across diverse scenarios and viewpoints using onboard edge devices is a major challenge in computer vision. To tackle this issue, we propose a lightweight multi-scale small object enha...
Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Springer
2025-06-01
|
| Series: | Complex & Intelligent Systems |
| Subjects: | |
| Online Access: | https://doi.org/10.1007/s40747-025-01971-0 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Abstract Detecting objects of varying scales, especially small ones, in Unmanned Aerial Vehicle (UAV) aerial images across diverse scenarios and viewpoints using onboard edge devices is a major challenge in computer vision. To tackle this issue, we propose a lightweight multi-scale small object enhancement network (LMSOE-Net) based on the YOLOv8 architecture. To improve both detection performance and model efficiency, we introduce the Efficient Multi Scale Pyramid (EMSP) neck network. This versatile feature fusion network enhances multi-scale feature extraction and integration by using convolutional modules with varying kernel sizes. This design enables more effective feature extraction and facilitates the fusion of local details with channel features. Additionally, we replace the Spatial Pyramid Pooling Fast (SPPF) module in YOLOv8 with the Feature Pyramid Shared Convolution (FPSC) module. This upgrade strengthens the network’s ability to capture fine details and complex patterns, improving multi-scale feature extraction without a significant increase in parameters. We further optimize the model by incorporating shared convolution with detail-enhancement capabilities in the detection head, which improves the detection of small objects across different scales. We evaluate LMSOE-Net through ablation and comparison experiments on the VisDrone2019 and DOTAv1.5 datasets. Compared to three variants of YOLOv8, LMSOE-Net reduces parameters and computational complexity by approximately 30%, while improving mAP@0.5 and mAP@0.5:0.95 by 1–2%. The results demonstrate that our approach significantly boosts detection accuracy and optimizes model efficiency through the integration of our proposed enhancement modules. The source codes are at: https://github.com/ljdl1/LMSOE-Net . |
|---|---|
| ISSN: | 2199-4536 2198-6053 |