Multi-Scale Feature Fusion and Context-Enhanced Spatial Sparse Convolution Single-Shot Detector for Unmanned Aerial Vehicle Image Object Detection

Accurate and efficient object detection in UAV images is a challenging task due to the diversity of target scales and the massive number of small targets. This study investigates the enhancement in the detection head using sparse convolution, demonstrating its effectiveness in achieving an optimal b...

Full description

Saved in:

Bibliographic Details
Main Authors:	Guimei Qi, Zhihong Yu, Jian Song
Format:	Article
Language:	English
Published:	MDPI AG 2025-01-01
Series:	Applied Sciences
Subjects:	UAV image object detection SSD multi-scale feature fusion context-enhanced spatial sparse convolution
Online Access:	https://www.mdpi.com/2076-3417/15/2/924
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832589163272077312
author	Guimei Qi Zhihong Yu Jian Song
author_facet	Guimei Qi Zhihong Yu Jian Song
author_sort	Guimei Qi
collection	DOAJ
description	Accurate and efficient object detection in UAV images is a challenging task due to the diversity of target scales and the massive number of small targets. This study investigates the enhancement in the detection head using sparse convolution, demonstrating its effectiveness in achieving an optimal balance between accuracy and efficiency. Nevertheless, the sparse convolution method encounters challenges related to the inadequate incorporation of global contextual information and exhibits network inflexibility attributable to its fixed mask ratios. To address the above issues, the MFFCESSC-SSD, a novel single-shot detector (SSD) with multi-scale feature fusion and context-enhanced spatial sparse convolution, is proposed in this paper. First, a global context-enhanced group normalization (CE-GN) layer is developed to address the issue of information loss resulting from the convolution process applied exclusively to the masked region. Subsequently, a dynamic masking strategy is designed to determine the optimal mask ratios, thereby ensuring compact foreground coverage that enhances both accuracy and efficiency. Experiments on two datasets (i.e., VisDrone and ARH2000; the latter dataset was created by the researchers) demonstrate that the MFFCESSC-SSD remarkably outperforms the performance of the SSD and numerous conventional object detection algorithms in terms of accuracy and efficiency.
format	Article
id	doaj-art-0b0cd3f91899469b81179a7117ab5980
institution	Kabale University
issn	2076-3417
language	English
publishDate	2025-01-01
publisher	MDPI AG
record_format	Article
series	Applied Sciences
spelling	doaj-art-0b0cd3f91899469b81179a7117ab59802025-01-24T13:21:21ZengMDPI AGApplied Sciences2076-34172025-01-0115292410.3390/app15020924Multi-Scale Feature Fusion and Context-Enhanced Spatial Sparse Convolution Single-Shot Detector for Unmanned Aerial Vehicle Image Object DetectionGuimei Qi0Zhihong Yu1Jian Song2College of Computer Science and Technology, Inner Mongolia Normal University, Hohhot 010022, ChinaCollege of Mechanical and Electrical Engineering, Inner Mongolia Agricultural University, Hohhot 010010, ChinaCollege of Mechanical and Electrical Engineering, Inner Mongolia Agricultural University, Hohhot 010010, ChinaAccurate and efficient object detection in UAV images is a challenging task due to the diversity of target scales and the massive number of small targets. This study investigates the enhancement in the detection head using sparse convolution, demonstrating its effectiveness in achieving an optimal balance between accuracy and efficiency. Nevertheless, the sparse convolution method encounters challenges related to the inadequate incorporation of global contextual information and exhibits network inflexibility attributable to its fixed mask ratios. To address the above issues, the MFFCESSC-SSD, a novel single-shot detector (SSD) with multi-scale feature fusion and context-enhanced spatial sparse convolution, is proposed in this paper. First, a global context-enhanced group normalization (CE-GN) layer is developed to address the issue of information loss resulting from the convolution process applied exclusively to the masked region. Subsequently, a dynamic masking strategy is designed to determine the optimal mask ratios, thereby ensuring compact foreground coverage that enhances both accuracy and efficiency. Experiments on two datasets (i.e., VisDrone and ARH2000; the latter dataset was created by the researchers) demonstrate that the MFFCESSC-SSD remarkably outperforms the performance of the SSD and numerous conventional object detection algorithms in terms of accuracy and efficiency.https://www.mdpi.com/2076-3417/15/2/924UAV image object detectionSSDmulti-scale feature fusioncontext-enhanced spatial sparse convolution
spellingShingle	Guimei Qi Zhihong Yu Jian Song Multi-Scale Feature Fusion and Context-Enhanced Spatial Sparse Convolution Single-Shot Detector for Unmanned Aerial Vehicle Image Object Detection Applied Sciences UAV image object detection SSD multi-scale feature fusion context-enhanced spatial sparse convolution
title	Multi-Scale Feature Fusion and Context-Enhanced Spatial Sparse Convolution Single-Shot Detector for Unmanned Aerial Vehicle Image Object Detection
title_full	Multi-Scale Feature Fusion and Context-Enhanced Spatial Sparse Convolution Single-Shot Detector for Unmanned Aerial Vehicle Image Object Detection
title_fullStr	Multi-Scale Feature Fusion and Context-Enhanced Spatial Sparse Convolution Single-Shot Detector for Unmanned Aerial Vehicle Image Object Detection
title_full_unstemmed	Multi-Scale Feature Fusion and Context-Enhanced Spatial Sparse Convolution Single-Shot Detector for Unmanned Aerial Vehicle Image Object Detection
title_short	Multi-Scale Feature Fusion and Context-Enhanced Spatial Sparse Convolution Single-Shot Detector for Unmanned Aerial Vehicle Image Object Detection
title_sort	multi scale feature fusion and context enhanced spatial sparse convolution single shot detector for unmanned aerial vehicle image object detection
topic	UAV image object detection SSD multi-scale feature fusion context-enhanced spatial sparse convolution
url	https://www.mdpi.com/2076-3417/15/2/924
work_keys_str_mv	AT guimeiqi multiscalefeaturefusionandcontextenhancedspatialsparseconvolutionsingleshotdetectorforunmannedaerialvehicleimageobjectdetection AT zhihongyu multiscalefeaturefusionandcontextenhancedspatialsparseconvolutionsingleshotdetectorforunmannedaerialvehicleimageobjectdetection AT jiansong multiscalefeaturefusionandcontextenhancedspatialsparseconvolutionsingleshotdetectorforunmannedaerialvehicleimageobjectdetection

Multi-Scale Feature Fusion and Context-Enhanced Spatial Sparse Convolution Single-Shot Detector for Unmanned Aerial Vehicle Image Object Detection

Similar Items