Head information bottleneck (HIB): leveraging information bottleneck for efficient transformer head attribution and pruning

Abstract Multi-head attention mechanisms have been widely applied in speech pre-training. However, their roles and effectiveness in various downstream tasks have not been fully explored. Attention heads may vary in importance depending on the downstream task. We assume that the attention allocation...

Full description

Saved in:
Bibliographic Details
Main Authors: Yukun Qian, Xuyi Zhuang, Mingjiang Wang
Format: Article
Language:English
Published: SpringerOpen 2025-07-01
Series:EURASIP Journal on Audio, Speech, and Music Processing
Subjects:
Online Access:https://doi.org/10.1186/s13636-025-00411-8
Tags: Add Tag
No Tags, Be the first to tag this record!