Object Detection Post Processing Accelerator Based on Co-Design of Hardware and Software

Deep learning significantly advances object detection. Post processes, a critical component of this process, select valid bounding boxes to represent the true targets during inference and assign boxes and labels to these objects during training to optimize the loss function. However, post processes...

Full description

Saved in:
Bibliographic Details
Main Authors: Dengtian Yang, Lan Chen, Xiaoran Hao, Yiheng Zhang
Format: Article
Language:English
Published: MDPI AG 2025-01-01
Series:Information
Subjects:
Online Access:https://www.mdpi.com/2078-2489/16/1/63
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832588343238459392
author Dengtian Yang
Lan Chen
Xiaoran Hao
Yiheng Zhang
author_facet Dengtian Yang
Lan Chen
Xiaoran Hao
Yiheng Zhang
author_sort Dengtian Yang
collection DOAJ
description Deep learning significantly advances object detection. Post processes, a critical component of this process, select valid bounding boxes to represent the true targets during inference and assign boxes and labels to these objects during training to optimize the loss function. However, post processes constitute a substantial portion of the total processing time for a single image. This inefficiency primarily arises from the extensive Intersection over Union (IoU) calculations required between numerous redundant bounding boxes in post processing algorithms. To reduce these redundant IoU calculations, we introduce a classification prioritization strategy during both training and inference post processes. Additionally, post processes involve sorting operations that contribute to their inefficiency. To minimize unnecessary comparisons in Top-K sorting, we have improved the bitonic sorter by developing a hybrid bitonic algorithm. These improvements have effectively accelerated the post processing. Given the similarities between the training and inference post processes, we unify four typical post processing algorithms and design a hardware accelerator based on this framework. Our accelerator achieves at least 7.55 times the speed in inference post processing compared to that of recent accelerators. When compared to the RTX 2080 Ti system, our proposed accelerator offers at least 21.93 times the speed for the training post process and 19.89 times for the inference post process, thereby significantly enhancing the efficiency of loss function minimization.
format Article
id doaj-art-56c4b58ebdd94106be25a0f45ab1bdcd
institution Kabale University
issn 2078-2489
language English
publishDate 2025-01-01
publisher MDPI AG
record_format Article
series Information
spelling doaj-art-56c4b58ebdd94106be25a0f45ab1bdcd2025-01-24T13:35:19ZengMDPI AGInformation2078-24892025-01-011616310.3390/info16010063Object Detection Post Processing Accelerator Based on Co-Design of Hardware and SoftwareDengtian Yang0Lan Chen1Xiaoran Hao2Yiheng Zhang3Institute of Microelectronics of the Chinese Academy of Sciences, Beijing 100029, ChinaInstitute of Microelectronics of the Chinese Academy of Sciences, Beijing 100029, ChinaInstitute of Microelectronics of the Chinese Academy of Sciences, Beijing 100029, ChinaInstitute of Microelectronics of the Chinese Academy of Sciences, Beijing 100029, ChinaDeep learning significantly advances object detection. Post processes, a critical component of this process, select valid bounding boxes to represent the true targets during inference and assign boxes and labels to these objects during training to optimize the loss function. However, post processes constitute a substantial portion of the total processing time for a single image. This inefficiency primarily arises from the extensive Intersection over Union (IoU) calculations required between numerous redundant bounding boxes in post processing algorithms. To reduce these redundant IoU calculations, we introduce a classification prioritization strategy during both training and inference post processes. Additionally, post processes involve sorting operations that contribute to their inefficiency. To minimize unnecessary comparisons in Top-K sorting, we have improved the bitonic sorter by developing a hybrid bitonic algorithm. These improvements have effectively accelerated the post processing. Given the similarities between the training and inference post processes, we unify four typical post processing algorithms and design a hardware accelerator based on this framework. Our accelerator achieves at least 7.55 times the speed in inference post processing compared to that of recent accelerators. When compared to the RTX 2080 Ti system, our proposed accelerator offers at least 21.93 times the speed for the training post process and 19.89 times for the inference post process, thereby significantly enhancing the efficiency of loss function minimization.https://www.mdpi.com/2078-2489/16/1/63deep learningobject detectionpost processaccelerator
spellingShingle Dengtian Yang
Lan Chen
Xiaoran Hao
Yiheng Zhang
Object Detection Post Processing Accelerator Based on Co-Design of Hardware and Software
Information
deep learning
object detection
post process
accelerator
title Object Detection Post Processing Accelerator Based on Co-Design of Hardware and Software
title_full Object Detection Post Processing Accelerator Based on Co-Design of Hardware and Software
title_fullStr Object Detection Post Processing Accelerator Based on Co-Design of Hardware and Software
title_full_unstemmed Object Detection Post Processing Accelerator Based on Co-Design of Hardware and Software
title_short Object Detection Post Processing Accelerator Based on Co-Design of Hardware and Software
title_sort object detection post processing accelerator based on co design of hardware and software
topic deep learning
object detection
post process
accelerator
url https://www.mdpi.com/2078-2489/16/1/63
work_keys_str_mv AT dengtianyang objectdetectionpostprocessingacceleratorbasedoncodesignofhardwareandsoftware
AT lanchen objectdetectionpostprocessingacceleratorbasedoncodesignofhardwareandsoftware
AT xiaoranhao objectdetectionpostprocessingacceleratorbasedoncodesignofhardwareandsoftware
AT yihengzhang objectdetectionpostprocessingacceleratorbasedoncodesignofhardwareandsoftware