Tailored knowledge distillation with automated loss function learning.

Knowledge Distillation (KD) is one of the most effective and widely used methods for model compression of large models. It has achieved significant success with the meticulous development of distillation losses. However, most state-of-the-art KD losses are manually crafted and task-specific, raising...

Full description

Saved in:
Bibliographic Details
Main Authors: Sheng Ran, Tao Huang, Wuyue Yang
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2025-01-01
Series:PLoS ONE
Online Access:https://doi.org/10.1371/journal.pone.0325599
Tags: Add Tag
No Tags, Be the first to tag this record!