End-to-End Multi-Modal Speaker Change Detection with Pre-Trained Models
In this work, we propose a multi-modal speaker change detection (SCD) approach with focal loss, which integrates both audio and text features to enhance detection performance. The proposed approach utilizes pre-trained large-scale models for feature extraction and incorporates a self-attention mecha...
Saved in:
| Main Authors: | Alymzhan Toleu, Gulmira Tolegen, Alexandr Pak, Jaxylykova Assel, Bagashar Zhumazhanov |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-04-01
|
| Series: | Applied Sciences |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2076-3417/15/8/4324 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
-
Comparative Analysis of Audio Features for Unsupervised Speaker Change Detection
by: Alymzhan Toleu, et al.
Published: (2024-12-01) -
Method of speakers segmentation based on pre-segmentation
by: ZHENG Tie-ran, et al.
Published: (2009-01-01) -
End-to-End Multi-Speaker FastSpeech2 With Hierarchical Decoder
by: Majid Adibian, et al.
Published: (2025-01-01) -
Speaking out for speakers: a guide for and analysis of robot speaker design
by: Nnamdi Nwagwu, et al.
Published: (2024-11-01) -
Exploring Psychological Well-being of Indonesian Pre-service English Teachers as Non-native Speakers
by: Ifa Maghfirotus Sya'idah, et al.
Published: (2023-12-01)