Knowledge Distillation in Object Detection for Resource-Constrained Edge Computing

Edge computing, a distributed computing paradigm that places small yet capable computing devices near data sources and IoT sensors, is gaining widespread adoption in various real-world applications, such as real-time intelligent drones, autonomous vehicles, and robotics. Object detection (OD) is an...

Full description

Saved in:

Bibliographic Details
Main Authors:	Arief Setyanto, Theopilus Bayu Sasongko, Muhammad Ainul Fikri, Dhani Ariatmanto, I. Made Artha Agastya, Rakandhiya Daanii Rachmanto, Affan Ardana, In Kee Kim
Format:	Article
Language:	English
Published:	IEEE 2025-01-01
Series:	IEEE Access
Subjects:	Object detection knowledge distillation contrastive representation distillation YOLOv4 MobilenetV2 RepViT
Online Access:	https://ieeexplore.ieee.org/document/10852314/
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832576750167523328
author	Arief Setyanto Theopilus Bayu Sasongko Muhammad Ainul Fikri Dhani Ariatmanto I. Made Artha Agastya Rakandhiya Daanii Rachmanto Affan Ardana In Kee Kim
author_facet	Arief Setyanto Theopilus Bayu Sasongko Muhammad Ainul Fikri Dhani Ariatmanto I. Made Artha Agastya Rakandhiya Daanii Rachmanto Affan Ardana In Kee Kim
author_sort	Arief Setyanto
collection	DOAJ
description	Edge computing, a distributed computing paradigm that places small yet capable computing devices near data sources and IoT sensors, is gaining widespread adoption in various real-world applications, such as real-time intelligent drones, autonomous vehicles, and robotics. Object detection (OD) is an essential task in computer vision. Although state-of-the-art deep learning-based OD methods achieve high detection rates, their large model size and high computational demands often hinder deployment on resource-constrained edge devices. Given their limited memory and computational power, edge devices like the Jetson Nano (J. Nano), Jetson Orin Nano (Orin Nano), and Raspberry Pi 4B (Raspi4B) require model optimization and compression techniques in order to deploy large OD models such as YOLO. YOLOv4 is a widely used OD model with a backbone for image feature extraction and a prediction layer. Originally, YOLOv4 was designed to use CSPDarkNet53 as its backbone, which requires significant computational power. In this paper, we propose replacing its backbone with a smaller model, such as MobileNetV2 and RepViT. In order to ensure the strong backbone performance, we perform knowledge distillation (KD), using CSPDarknet53 as the teacher and the smaller model as the student. We compare various KD algorithms to identify the technique that produces a smaller model with the modest accuracy drop. According to our experiments, Contrastive Representation Distillation (CRD) yields MobileNetV2 and RepViT with an acceptable accuracy drop. We consider both accuracy drop and model size to choose either MobileNetV2 or RepViT model to replace CSPDarknet53 in the modified YOLOv4 named M-YOLO-CRD and RV-YOLO-CRD. Our evaluation results demonstrate that RV-YOLO-CRD reduces 30% of model size and achieves better mean average precision (mAP) than M-YOLO-CRD. Our experiments show that the M-YOLO-CRD significantly reduces model size (from 245.5 MB to 35.76 MB) and inference time (<inline-formula> <tex-math notation="LaTeX">$6\times $ </tex-math></inline-formula> faster on CPU, <inline-formula> <tex-math notation="LaTeX">$4\times $ </tex-math></inline-formula> faster on J. Nano, and <inline-formula> <tex-math notation="LaTeX">$2.5\times $ </tex-math></inline-formula> faster on Orin Nano. While the precision decreased slightly (less than 4%), the model still performs well on edge devices. The M-YOLO-CRD achieved latency per frame at around 37 ms on Orin Nano, 168 ms on J. Nano, and 1310 ms on Raspi4B.
format	Article
id	doaj-art-3ede4af3bb144f0a87257a0495a66426
institution	Kabale University
issn	2169-3536
language	English
publishDate	2025-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj-art-3ede4af3bb144f0a87257a0495a664262025-01-31T00:01:16ZengIEEEIEEE Access2169-35362025-01-0113182001821410.1109/ACCESS.2025.353402010852314Knowledge Distillation in Object Detection for Resource-Constrained Edge ComputingArief Setyanto0https://orcid.org/0000-0003-0721-3941Theopilus Bayu Sasongko1https://orcid.org/0009-0005-8428-9327Muhammad Ainul Fikri2https://orcid.org/0009-0001-7090-8554Dhani Ariatmanto3https://orcid.org/0000-0001-8877-9941I. Made Artha Agastya4https://orcid.org/0000-0002-8739-5767Rakandhiya Daanii Rachmanto5Affan Ardana6https://orcid.org/0009-0005-2234-0131In Kee Kim7https://orcid.org/0000-0003-1330-7784Magister of Informatics, Universitas Amikom Yogyakarta, Sleman, IndonesiaDepartment of Informatics, Universitas Amikom Yogyakarta, Sleman, IndonesiaDepartment of Informatics Engineering, Jember State Polytechnic, Jember, IndonesiaMagister of Informatics, Universitas Amikom Yogyakarta, Sleman, IndonesiaDepartment of Informatics, Universitas Amikom Yogyakarta, Sleman, IndonesiaSchool of Computing, University of Georgia, Athens, GA, USADepartment of Informatics, Universitas Amikom Yogyakarta, Sleman, IndonesiaSchool of Computing, University of Georgia, Athens, GA, USAEdge computing, a distributed computing paradigm that places small yet capable computing devices near data sources and IoT sensors, is gaining widespread adoption in various real-world applications, such as real-time intelligent drones, autonomous vehicles, and robotics. Object detection (OD) is an essential task in computer vision. Although state-of-the-art deep learning-based OD methods achieve high detection rates, their large model size and high computational demands often hinder deployment on resource-constrained edge devices. Given their limited memory and computational power, edge devices like the Jetson Nano (J. Nano), Jetson Orin Nano (Orin Nano), and Raspberry Pi 4B (Raspi4B) require model optimization and compression techniques in order to deploy large OD models such as YOLO. YOLOv4 is a widely used OD model with a backbone for image feature extraction and a prediction layer. Originally, YOLOv4 was designed to use CSPDarkNet53 as its backbone, which requires significant computational power. In this paper, we propose replacing its backbone with a smaller model, such as MobileNetV2 and RepViT. In order to ensure the strong backbone performance, we perform knowledge distillation (KD), using CSPDarknet53 as the teacher and the smaller model as the student. We compare various KD algorithms to identify the technique that produces a smaller model with the modest accuracy drop. According to our experiments, Contrastive Representation Distillation (CRD) yields MobileNetV2 and RepViT with an acceptable accuracy drop. We consider both accuracy drop and model size to choose either MobileNetV2 or RepViT model to replace CSPDarknet53 in the modified YOLOv4 named M-YOLO-CRD and RV-YOLO-CRD. Our evaluation results demonstrate that RV-YOLO-CRD reduces 30% of model size and achieves better mean average precision (mAP) than M-YOLO-CRD. Our experiments show that the M-YOLO-CRD significantly reduces model size (from 245.5 MB to 35.76 MB) and inference time (<inline-formula> <tex-math notation="LaTeX">$6\times $ </tex-math></inline-formula> faster on CPU, <inline-formula> <tex-math notation="LaTeX">$4\times $ </tex-math></inline-formula> faster on J. Nano, and <inline-formula> <tex-math notation="LaTeX">$2.5\times $ </tex-math></inline-formula> faster on Orin Nano. While the precision decreased slightly (less than 4%), the model still performs well on edge devices. The M-YOLO-CRD achieved latency per frame at around 37 ms on Orin Nano, 168 ms on J. Nano, and 1310 ms on Raspi4B.https://ieeexplore.ieee.org/document/10852314/Object detectionknowledge distillationcontrastive representation distillationYOLOv4MobilenetV2RepViT
spellingShingle	Arief Setyanto Theopilus Bayu Sasongko Muhammad Ainul Fikri Dhani Ariatmanto I. Made Artha Agastya Rakandhiya Daanii Rachmanto Affan Ardana In Kee Kim Knowledge Distillation in Object Detection for Resource-Constrained Edge Computing IEEE Access Object detection knowledge distillation contrastive representation distillation YOLOv4 MobilenetV2 RepViT
title	Knowledge Distillation in Object Detection for Resource-Constrained Edge Computing
title_full	Knowledge Distillation in Object Detection for Resource-Constrained Edge Computing
title_fullStr	Knowledge Distillation in Object Detection for Resource-Constrained Edge Computing
title_full_unstemmed	Knowledge Distillation in Object Detection for Resource-Constrained Edge Computing
title_short	Knowledge Distillation in Object Detection for Resource-Constrained Edge Computing
title_sort	knowledge distillation in object detection for resource constrained edge computing
topic	Object detection knowledge distillation contrastive representation distillation YOLOv4 MobilenetV2 RepViT
url	https://ieeexplore.ieee.org/document/10852314/
work_keys_str_mv	AT ariefsetyanto knowledgedistillationinobjectdetectionforresourceconstrainededgecomputing AT theopilusbayusasongko knowledgedistillationinobjectdetectionforresourceconstrainededgecomputing AT muhammadainulfikri knowledgedistillationinobjectdetectionforresourceconstrainededgecomputing AT dhaniariatmanto knowledgedistillationinobjectdetectionforresourceconstrainededgecomputing AT imadearthaagastya knowledgedistillationinobjectdetectionforresourceconstrainededgecomputing AT rakandhiyadaaniirachmanto knowledgedistillationinobjectdetectionforresourceconstrainededgecomputing AT affanardana knowledgedistillationinobjectdetectionforresourceconstrainededgecomputing AT inkeekim knowledgedistillationinobjectdetectionforresourceconstrainededgecomputing

Knowledge Distillation in Object Detection for Resource-Constrained Edge Computing

Similar Items