End-to-End Multi-Modal Speaker Change Detection with Pre-Trained Models

In this work, we propose a multi-modal speaker change detection (SCD) approach with focal loss, which integrates both audio and text features to enhance detection performance. The proposed approach utilizes pre-trained large-scale models for feature extraction and incorporates a self-attention mecha...

Full description

Saved in:

Bibliographic Details
Main Authors:	Alymzhan Toleu, Gulmira Tolegen, Alexandr Pak, Jaxylykova Assel, Bagashar Zhumazhanov
Format:	Article
Language:	English
Published:	MDPI AG 2025-04-01
Series:	Applied Sciences
Subjects:	speaker change detection pre-trained model multi-modal
Online Access:	https://www.mdpi.com/2076-3417/15/8/4324
Tags:	Add Tag No Tags, Be the first to tag this record!

Internet

https://www.mdpi.com/2076-3417/15/8/4324

End-to-End Multi-Modal Speaker Change Detection with Pre-Trained Models

Internet

Similar Items