DM-KD: Decoupling Mixed-Images for Efficient Knowledge Distillation

Knowledge distillation (KD) is a method of model compression. It involves extracting valuable knowledge from a high-performance and high-capacity teacher model and transferring this knowledge to a target student model having relatively small capacity. However, we discover that naively applying mixed...

Full description

Saved in:
Bibliographic Details
Main Authors: Jongkyung Im, Younho Jang, Junpyo Lim, Taegoo Kang, Chaoning Zhang, Sung-Ho Bae
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10819346/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832592922000752640
author Jongkyung Im
Younho Jang
Junpyo Lim
Taegoo Kang
Chaoning Zhang
Sung-Ho Bae
author_facet Jongkyung Im
Younho Jang
Junpyo Lim
Taegoo Kang
Chaoning Zhang
Sung-Ho Bae
author_sort Jongkyung Im
collection DOAJ
description Knowledge distillation (KD) is a method of model compression. It involves extracting valuable knowledge from a high-performance and high-capacity teacher model and transferring this knowledge to a target student model having relatively small capacity. However, we discover that naively applying mixed-augmentation to KD impacts the student model’s learning through knowledge distillation negatively. We analyze this side-effect of mix-augmentation when using knowledge distillation and propose a new method that addresses this problem. That is, we found that mixed images tend to make the teacher generate unstable and poor-quality logits which hinder transferring the knowledge. To solve this problem, we decouple an input mixed image into two original images and feed them into the teacher model individually. After then we interpolate the two individual logits to generate a logit for KD. For the student, the mixed image is still used as input of the student. This decoupling strategy allows the stability of logit distributions of the teacher, thus resulting in higher KD performance with mixed augmentation. To verify the effectiveness of the proposed method, we experiment on various datasets and mixed augmentation methods, demonstrating that the proposed method showed 0.31%-0.69% improvement in top-1 accuracy compared to the original KD method on theImageNet dataset.
format Article
id doaj-art-4ff1a194154e44f7ad9e9d83ab6cdf1f
institution Kabale University
issn 2169-3536
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-4ff1a194154e44f7ad9e9d83ab6cdf1f2025-01-21T00:00:54ZengIEEEIEEE Access2169-35362025-01-0113105271053410.1109/ACCESS.2024.352473410819346DM-KD: Decoupling Mixed-Images for Efficient Knowledge DistillationJongkyung Im0https://orcid.org/0009-0005-7695-0752Younho Jang1Junpyo Lim2Taegoo Kang3Chaoning Zhang4https://orcid.org/0000-0001-6007-6099Sung-Ho Bae5https://orcid.org/0000-0003-2677-3186Department of Artificial Intelligence, Kyung Hee University, Yongin-si, Republic of KoreaDepartment of Computer Science and Engineering, Kyung Hee University, Yongin-si, Republic of KoreaDepartment of Artificial Intelligence, Kyung Hee University, Yongin-si, Republic of KoreaDepartment of Artificial Intelligence, Kyung Hee University, Yongin-si, Republic of KoreaDepartment of Computer Science and Engineering, Kyung Hee University, Yongin-si, Republic of KoreaDepartment of Computer Science and Engineering, Kyung Hee University, Yongin-si, Republic of KoreaKnowledge distillation (KD) is a method of model compression. It involves extracting valuable knowledge from a high-performance and high-capacity teacher model and transferring this knowledge to a target student model having relatively small capacity. However, we discover that naively applying mixed-augmentation to KD impacts the student model’s learning through knowledge distillation negatively. We analyze this side-effect of mix-augmentation when using knowledge distillation and propose a new method that addresses this problem. That is, we found that mixed images tend to make the teacher generate unstable and poor-quality logits which hinder transferring the knowledge. To solve this problem, we decouple an input mixed image into two original images and feed them into the teacher model individually. After then we interpolate the two individual logits to generate a logit for KD. For the student, the mixed image is still used as input of the student. This decoupling strategy allows the stability of logit distributions of the teacher, thus resulting in higher KD performance with mixed augmentation. To verify the effectiveness of the proposed method, we experiment on various datasets and mixed augmentation methods, demonstrating that the proposed method showed 0.31%-0.69% improvement in top-1 accuracy compared to the original KD method on theImageNet dataset.https://ieeexplore.ieee.org/document/10819346/Deep learningknowledge distillationaugmentationdecouplingCutMixMixUp
spellingShingle Jongkyung Im
Younho Jang
Junpyo Lim
Taegoo Kang
Chaoning Zhang
Sung-Ho Bae
DM-KD: Decoupling Mixed-Images for Efficient Knowledge Distillation
IEEE Access
Deep learning
knowledge distillation
augmentation
decoupling
CutMix
MixUp
title DM-KD: Decoupling Mixed-Images for Efficient Knowledge Distillation
title_full DM-KD: Decoupling Mixed-Images for Efficient Knowledge Distillation
title_fullStr DM-KD: Decoupling Mixed-Images for Efficient Knowledge Distillation
title_full_unstemmed DM-KD: Decoupling Mixed-Images for Efficient Knowledge Distillation
title_short DM-KD: Decoupling Mixed-Images for Efficient Knowledge Distillation
title_sort dm kd decoupling mixed images for efficient knowledge distillation
topic Deep learning
knowledge distillation
augmentation
decoupling
CutMix
MixUp
url https://ieeexplore.ieee.org/document/10819346/
work_keys_str_mv AT jongkyungim dmkddecouplingmixedimagesforefficientknowledgedistillation
AT younhojang dmkddecouplingmixedimagesforefficientknowledgedistillation
AT junpyolim dmkddecouplingmixedimagesforefficientknowledgedistillation
AT taegookang dmkddecouplingmixedimagesforefficientknowledgedistillation
AT chaoningzhang dmkddecouplingmixedimagesforefficientknowledgedistillation
AT sunghobae dmkddecouplingmixedimagesforefficientknowledgedistillation