Multi-Scale Feature Fusion and Context-Enhanced Spatial Sparse Convolution Single-Shot Detector for Unmanned Aerial Vehicle Image Object Detection

Accurate and efficient object detection in UAV images is a challenging task due to the diversity of target scales and the massive number of small targets. This study investigates the enhancement in the detection head using sparse convolution, demonstrating its effectiveness in achieving an optimal b...

Full description

Saved in:
Bibliographic Details
Main Authors: Guimei Qi, Zhihong Yu, Jian Song
Format: Article
Language:English
Published: MDPI AG 2025-01-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/15/2/924
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832589163272077312
author Guimei Qi
Zhihong Yu
Jian Song
author_facet Guimei Qi
Zhihong Yu
Jian Song
author_sort Guimei Qi
collection DOAJ
description Accurate and efficient object detection in UAV images is a challenging task due to the diversity of target scales and the massive number of small targets. This study investigates the enhancement in the detection head using sparse convolution, demonstrating its effectiveness in achieving an optimal balance between accuracy and efficiency. Nevertheless, the sparse convolution method encounters challenges related to the inadequate incorporation of global contextual information and exhibits network inflexibility attributable to its fixed mask ratios. To address the above issues, the MFFCESSC-SSD, a novel single-shot detector (SSD) with multi-scale feature fusion and context-enhanced spatial sparse convolution, is proposed in this paper. First, a global context-enhanced group normalization (CE-GN) layer is developed to address the issue of information loss resulting from the convolution process applied exclusively to the masked region. Subsequently, a dynamic masking strategy is designed to determine the optimal mask ratios, thereby ensuring compact foreground coverage that enhances both accuracy and efficiency. Experiments on two datasets (i.e., VisDrone and ARH2000; the latter dataset was created by the researchers) demonstrate that the MFFCESSC-SSD remarkably outperforms the performance of the SSD and numerous conventional object detection algorithms in terms of accuracy and efficiency.
format Article
id doaj-art-0b0cd3f91899469b81179a7117ab5980
institution Kabale University
issn 2076-3417
language English
publishDate 2025-01-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj-art-0b0cd3f91899469b81179a7117ab59802025-01-24T13:21:21ZengMDPI AGApplied Sciences2076-34172025-01-0115292410.3390/app15020924Multi-Scale Feature Fusion and Context-Enhanced Spatial Sparse Convolution Single-Shot Detector for Unmanned Aerial Vehicle Image Object DetectionGuimei Qi0Zhihong Yu1Jian Song2College of Computer Science and Technology, Inner Mongolia Normal University, Hohhot 010022, ChinaCollege of Mechanical and Electrical Engineering, Inner Mongolia Agricultural University, Hohhot 010010, ChinaCollege of Mechanical and Electrical Engineering, Inner Mongolia Agricultural University, Hohhot 010010, ChinaAccurate and efficient object detection in UAV images is a challenging task due to the diversity of target scales and the massive number of small targets. This study investigates the enhancement in the detection head using sparse convolution, demonstrating its effectiveness in achieving an optimal balance between accuracy and efficiency. Nevertheless, the sparse convolution method encounters challenges related to the inadequate incorporation of global contextual information and exhibits network inflexibility attributable to its fixed mask ratios. To address the above issues, the MFFCESSC-SSD, a novel single-shot detector (SSD) with multi-scale feature fusion and context-enhanced spatial sparse convolution, is proposed in this paper. First, a global context-enhanced group normalization (CE-GN) layer is developed to address the issue of information loss resulting from the convolution process applied exclusively to the masked region. Subsequently, a dynamic masking strategy is designed to determine the optimal mask ratios, thereby ensuring compact foreground coverage that enhances both accuracy and efficiency. Experiments on two datasets (i.e., VisDrone and ARH2000; the latter dataset was created by the researchers) demonstrate that the MFFCESSC-SSD remarkably outperforms the performance of the SSD and numerous conventional object detection algorithms in terms of accuracy and efficiency.https://www.mdpi.com/2076-3417/15/2/924UAV image object detectionSSDmulti-scale feature fusioncontext-enhanced spatial sparse convolution
spellingShingle Guimei Qi
Zhihong Yu
Jian Song
Multi-Scale Feature Fusion and Context-Enhanced Spatial Sparse Convolution Single-Shot Detector for Unmanned Aerial Vehicle Image Object Detection
Applied Sciences
UAV image object detection
SSD
multi-scale feature fusion
context-enhanced spatial sparse convolution
title Multi-Scale Feature Fusion and Context-Enhanced Spatial Sparse Convolution Single-Shot Detector for Unmanned Aerial Vehicle Image Object Detection
title_full Multi-Scale Feature Fusion and Context-Enhanced Spatial Sparse Convolution Single-Shot Detector for Unmanned Aerial Vehicle Image Object Detection
title_fullStr Multi-Scale Feature Fusion and Context-Enhanced Spatial Sparse Convolution Single-Shot Detector for Unmanned Aerial Vehicle Image Object Detection
title_full_unstemmed Multi-Scale Feature Fusion and Context-Enhanced Spatial Sparse Convolution Single-Shot Detector for Unmanned Aerial Vehicle Image Object Detection
title_short Multi-Scale Feature Fusion and Context-Enhanced Spatial Sparse Convolution Single-Shot Detector for Unmanned Aerial Vehicle Image Object Detection
title_sort multi scale feature fusion and context enhanced spatial sparse convolution single shot detector for unmanned aerial vehicle image object detection
topic UAV image object detection
SSD
multi-scale feature fusion
context-enhanced spatial sparse convolution
url https://www.mdpi.com/2076-3417/15/2/924
work_keys_str_mv AT guimeiqi multiscalefeaturefusionandcontextenhancedspatialsparseconvolutionsingleshotdetectorforunmannedaerialvehicleimageobjectdetection
AT zhihongyu multiscalefeaturefusionandcontextenhancedspatialsparseconvolutionsingleshotdetectorforunmannedaerialvehicleimageobjectdetection
AT jiansong multiscalefeaturefusionandcontextenhancedspatialsparseconvolutionsingleshotdetectorforunmannedaerialvehicleimageobjectdetection