DM-KD: Decoupling Mixed-Images for Efficient Knowledge Distillation
Knowledge distillation (KD) is a method of model compression. It involves extracting valuable knowledge from a high-performance and high-capacity teacher model and transferring this knowledge to a target student model having relatively small capacity. However, we discover that naively applying mixed...
Saved in:
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2025-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/10819346/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832592922000752640 |
---|---|
author | Jongkyung Im Younho Jang Junpyo Lim Taegoo Kang Chaoning Zhang Sung-Ho Bae |
author_facet | Jongkyung Im Younho Jang Junpyo Lim Taegoo Kang Chaoning Zhang Sung-Ho Bae |
author_sort | Jongkyung Im |
collection | DOAJ |
description | Knowledge distillation (KD) is a method of model compression. It involves extracting valuable knowledge from a high-performance and high-capacity teacher model and transferring this knowledge to a target student model having relatively small capacity. However, we discover that naively applying mixed-augmentation to KD impacts the student model’s learning through knowledge distillation negatively. We analyze this side-effect of mix-augmentation when using knowledge distillation and propose a new method that addresses this problem. That is, we found that mixed images tend to make the teacher generate unstable and poor-quality logits which hinder transferring the knowledge. To solve this problem, we decouple an input mixed image into two original images and feed them into the teacher model individually. After then we interpolate the two individual logits to generate a logit for KD. For the student, the mixed image is still used as input of the student. This decoupling strategy allows the stability of logit distributions of the teacher, thus resulting in higher KD performance with mixed augmentation. To verify the effectiveness of the proposed method, we experiment on various datasets and mixed augmentation methods, demonstrating that the proposed method showed 0.31%-0.69% improvement in top-1 accuracy compared to the original KD method on theImageNet dataset. |
format | Article |
id | doaj-art-4ff1a194154e44f7ad9e9d83ab6cdf1f |
institution | Kabale University |
issn | 2169-3536 |
language | English |
publishDate | 2025-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj-art-4ff1a194154e44f7ad9e9d83ab6cdf1f2025-01-21T00:00:54ZengIEEEIEEE Access2169-35362025-01-0113105271053410.1109/ACCESS.2024.352473410819346DM-KD: Decoupling Mixed-Images for Efficient Knowledge DistillationJongkyung Im0https://orcid.org/0009-0005-7695-0752Younho Jang1Junpyo Lim2Taegoo Kang3Chaoning Zhang4https://orcid.org/0000-0001-6007-6099Sung-Ho Bae5https://orcid.org/0000-0003-2677-3186Department of Artificial Intelligence, Kyung Hee University, Yongin-si, Republic of KoreaDepartment of Computer Science and Engineering, Kyung Hee University, Yongin-si, Republic of KoreaDepartment of Artificial Intelligence, Kyung Hee University, Yongin-si, Republic of KoreaDepartment of Artificial Intelligence, Kyung Hee University, Yongin-si, Republic of KoreaDepartment of Computer Science and Engineering, Kyung Hee University, Yongin-si, Republic of KoreaDepartment of Computer Science and Engineering, Kyung Hee University, Yongin-si, Republic of KoreaKnowledge distillation (KD) is a method of model compression. It involves extracting valuable knowledge from a high-performance and high-capacity teacher model and transferring this knowledge to a target student model having relatively small capacity. However, we discover that naively applying mixed-augmentation to KD impacts the student model’s learning through knowledge distillation negatively. We analyze this side-effect of mix-augmentation when using knowledge distillation and propose a new method that addresses this problem. That is, we found that mixed images tend to make the teacher generate unstable and poor-quality logits which hinder transferring the knowledge. To solve this problem, we decouple an input mixed image into two original images and feed them into the teacher model individually. After then we interpolate the two individual logits to generate a logit for KD. For the student, the mixed image is still used as input of the student. This decoupling strategy allows the stability of logit distributions of the teacher, thus resulting in higher KD performance with mixed augmentation. To verify the effectiveness of the proposed method, we experiment on various datasets and mixed augmentation methods, demonstrating that the proposed method showed 0.31%-0.69% improvement in top-1 accuracy compared to the original KD method on theImageNet dataset.https://ieeexplore.ieee.org/document/10819346/Deep learningknowledge distillationaugmentationdecouplingCutMixMixUp |
spellingShingle | Jongkyung Im Younho Jang Junpyo Lim Taegoo Kang Chaoning Zhang Sung-Ho Bae DM-KD: Decoupling Mixed-Images for Efficient Knowledge Distillation IEEE Access Deep learning knowledge distillation augmentation decoupling CutMix MixUp |
title | DM-KD: Decoupling Mixed-Images for Efficient Knowledge Distillation |
title_full | DM-KD: Decoupling Mixed-Images for Efficient Knowledge Distillation |
title_fullStr | DM-KD: Decoupling Mixed-Images for Efficient Knowledge Distillation |
title_full_unstemmed | DM-KD: Decoupling Mixed-Images for Efficient Knowledge Distillation |
title_short | DM-KD: Decoupling Mixed-Images for Efficient Knowledge Distillation |
title_sort | dm kd decoupling mixed images for efficient knowledge distillation |
topic | Deep learning knowledge distillation augmentation decoupling CutMix MixUp |
url | https://ieeexplore.ieee.org/document/10819346/ |
work_keys_str_mv | AT jongkyungim dmkddecouplingmixedimagesforefficientknowledgedistillation AT younhojang dmkddecouplingmixedimagesforefficientknowledgedistillation AT junpyolim dmkddecouplingmixedimagesforefficientknowledgedistillation AT taegookang dmkddecouplingmixedimagesforefficientknowledgedistillation AT chaoningzhang dmkddecouplingmixedimagesforefficientknowledgedistillation AT sunghobae dmkddecouplingmixedimagesforefficientknowledgedistillation |