Predicting Fundamental Frequency Patterns in Electrolaryngeal Speech Using Automated Phoneme Extraction
We propose a system to enhance electrolaryngeal speech naturalness using automatically extracted phoneme representations. Phonemes provide sufficient information for predicting reasonably natural fundamental frequency patterns. Previous studies using forced-aligned phoneme labels to create shared fe...
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/10978849/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | We propose a system to enhance electrolaryngeal speech naturalness using automatically extracted phoneme representations. Phonemes provide sufficient information for predicting reasonably natural fundamental frequency patterns. Previous studies using forced-aligned phoneme labels to create shared features between electrolaryngeal and normal speech for fundamental frequency prediction are limited by their reliance on transcriptions, thereby restricting real-time use. To overcome this, our system leverages phonetic posteriorgrams from an automatic speech recognition system. By transforming these phonetic posteriorgrams into clustered phoneme embeddings, we predict natural fundamental frequency patterns without requiring transcriptions. Our experiments demonstrate that this approach not only provides a robust, real-time solution for electrolaryngeal speech enhancement but also enables effective training with limited electrolaryngeal speech data and large, publicly available normal speech datasets. |
|---|---|
| ISSN: | 2169-3536 |