DM-KD: Decoupling Mixed-Images for Efficient Knowledge Distillation

Knowledge distillation (KD) is a method of model compression. It involves extracting valuable knowledge from a high-performance and high-capacity teacher model and transferring this knowledge to a target student model having relatively small capacity. However, we discover that naively applying mixed...

Full description

Saved in:
Bibliographic Details
Main Authors: Jongkyung Im, Younho Jang, Junpyo Lim, Taegoo Kang, Chaoning Zhang, Sung-Ho Bae
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10819346/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Knowledge distillation (KD) is a method of model compression. It involves extracting valuable knowledge from a high-performance and high-capacity teacher model and transferring this knowledge to a target student model having relatively small capacity. However, we discover that naively applying mixed-augmentation to KD impacts the student model’s learning through knowledge distillation negatively. We analyze this side-effect of mix-augmentation when using knowledge distillation and propose a new method that addresses this problem. That is, we found that mixed images tend to make the teacher generate unstable and poor-quality logits which hinder transferring the knowledge. To solve this problem, we decouple an input mixed image into two original images and feed them into the teacher model individually. After then we interpolate the two individual logits to generate a logit for KD. For the student, the mixed image is still used as input of the student. This decoupling strategy allows the stability of logit distributions of the teacher, thus resulting in higher KD performance with mixed augmentation. To verify the effectiveness of the proposed method, we experiment on various datasets and mixed augmentation methods, demonstrating that the proposed method showed 0.31%-0.69% improvement in top-1 accuracy compared to the original KD method on theImageNet dataset.
ISSN:2169-3536