DM-KD: Decoupling Mixed-Images for Efficient Knowledge Distillation

Knowledge distillation (KD) is a method of model compression. It involves extracting valuable knowledge from a high-performance and high-capacity teacher model and transferring this knowledge to a target student model having relatively small capacity. However, we discover that naively applying mixed...

Full description

Saved in:

Bibliographic Details
Main Authors:	Jongkyung Im, Younho Jang, Junpyo Lim, Taegoo Kang, Chaoning Zhang, Sung-Ho Bae
Format:	Article
Language:	English
Published:	IEEE 2025-01-01
Series:	IEEE Access
Subjects:	Deep learning knowledge distillation augmentation decoupling CutMix MixUp
Online Access:	https://ieeexplore.ieee.org/document/10819346/
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832592922000752640
author	Jongkyung Im Younho Jang Junpyo Lim Taegoo Kang Chaoning Zhang Sung-Ho Bae
author_facet	Jongkyung Im Younho Jang Junpyo Lim Taegoo Kang Chaoning Zhang Sung-Ho Bae
author_sort	Jongkyung Im
collection	DOAJ
description	Knowledge distillation (KD) is a method of model compression. It involves extracting valuable knowledge from a high-performance and high-capacity teacher model and transferring this knowledge to a target student model having relatively small capacity. However, we discover that naively applying mixed-augmentation to KD impacts the student model’s learning through knowledge distillation negatively. We analyze this side-effect of mix-augmentation when using knowledge distillation and propose a new method that addresses this problem. That is, we found that mixed images tend to make the teacher generate unstable and poor-quality logits which hinder transferring the knowledge. To solve this problem, we decouple an input mixed image into two original images and feed them into the teacher model individually. After then we interpolate the two individual logits to generate a logit for KD. For the student, the mixed image is still used as input of the student. This decoupling strategy allows the stability of logit distributions of the teacher, thus resulting in higher KD performance with mixed augmentation. To verify the effectiveness of the proposed method, we experiment on various datasets and mixed augmentation methods, demonstrating that the proposed method showed 0.31%-0.69% improvement in top-1 accuracy compared to the original KD method on theImageNet dataset.
format	Article
id	doaj-art-4ff1a194154e44f7ad9e9d83ab6cdf1f
institution	Kabale University
issn	2169-3536
language	English
publishDate	2025-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj-art-4ff1a194154e44f7ad9e9d83ab6cdf1f2025-01-21T00:00:54ZengIEEEIEEE Access2169-35362025-01-0113105271053410.1109/ACCESS.2024.352473410819346DM-KD: Decoupling Mixed-Images for Efficient Knowledge DistillationJongkyung Im0https://orcid.org/0009-0005-7695-0752Younho Jang1Junpyo Lim2Taegoo Kang3Chaoning Zhang4https://orcid.org/0000-0001-6007-6099Sung-Ho Bae5https://orcid.org/0000-0003-2677-3186Department of Artificial Intelligence, Kyung Hee University, Yongin-si, Republic of KoreaDepartment of Computer Science and Engineering, Kyung Hee University, Yongin-si, Republic of KoreaDepartment of Artificial Intelligence, Kyung Hee University, Yongin-si, Republic of KoreaDepartment of Artificial Intelligence, Kyung Hee University, Yongin-si, Republic of KoreaDepartment of Computer Science and Engineering, Kyung Hee University, Yongin-si, Republic of KoreaDepartment of Computer Science and Engineering, Kyung Hee University, Yongin-si, Republic of KoreaKnowledge distillation (KD) is a method of model compression. It involves extracting valuable knowledge from a high-performance and high-capacity teacher model and transferring this knowledge to a target student model having relatively small capacity. However, we discover that naively applying mixed-augmentation to KD impacts the student model’s learning through knowledge distillation negatively. We analyze this side-effect of mix-augmentation when using knowledge distillation and propose a new method that addresses this problem. That is, we found that mixed images tend to make the teacher generate unstable and poor-quality logits which hinder transferring the knowledge. To solve this problem, we decouple an input mixed image into two original images and feed them into the teacher model individually. After then we interpolate the two individual logits to generate a logit for KD. For the student, the mixed image is still used as input of the student. This decoupling strategy allows the stability of logit distributions of the teacher, thus resulting in higher KD performance with mixed augmentation. To verify the effectiveness of the proposed method, we experiment on various datasets and mixed augmentation methods, demonstrating that the proposed method showed 0.31%-0.69% improvement in top-1 accuracy compared to the original KD method on theImageNet dataset.https://ieeexplore.ieee.org/document/10819346/Deep learningknowledge distillationaugmentationdecouplingCutMixMixUp
spellingShingle	Jongkyung Im Younho Jang Junpyo Lim Taegoo Kang Chaoning Zhang Sung-Ho Bae DM-KD: Decoupling Mixed-Images for Efficient Knowledge Distillation IEEE Access Deep learning knowledge distillation augmentation decoupling CutMix MixUp
title	DM-KD: Decoupling Mixed-Images for Efficient Knowledge Distillation
title_full	DM-KD: Decoupling Mixed-Images for Efficient Knowledge Distillation
title_fullStr	DM-KD: Decoupling Mixed-Images for Efficient Knowledge Distillation
title_full_unstemmed	DM-KD: Decoupling Mixed-Images for Efficient Knowledge Distillation
title_short	DM-KD: Decoupling Mixed-Images for Efficient Knowledge Distillation
title_sort	dm kd decoupling mixed images for efficient knowledge distillation
topic	Deep learning knowledge distillation augmentation decoupling CutMix MixUp
url	https://ieeexplore.ieee.org/document/10819346/
work_keys_str_mv	AT jongkyungim dmkddecouplingmixedimagesforefficientknowledgedistillation AT younhojang dmkddecouplingmixedimagesforefficientknowledgedistillation AT junpyolim dmkddecouplingmixedimagesforefficientknowledgedistillation AT taegookang dmkddecouplingmixedimagesforefficientknowledgedistillation AT chaoningzhang dmkddecouplingmixedimagesforefficientknowledgedistillation AT sunghobae dmkddecouplingmixedimagesforefficientknowledgedistillation

DM-KD: Decoupling Mixed-Images for Efficient Knowledge Distillation

Similar Items