Predicting Fundamental Frequency Patterns in Electrolaryngeal Speech Using Automated Phoneme Extraction

We propose a system to enhance electrolaryngeal speech naturalness using automatically extracted phoneme representations. Phonemes provide sufficient information for predicting reasonably natural fundamental frequency patterns. Previous studies using forced-aligned phoneme labels to create shared fe...

Full description

Saved in:
Bibliographic Details
Main Authors: Mohammad Eshghi, Tomoki Toda
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10978849/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:We propose a system to enhance electrolaryngeal speech naturalness using automatically extracted phoneme representations. Phonemes provide sufficient information for predicting reasonably natural fundamental frequency patterns. Previous studies using forced-aligned phoneme labels to create shared features between electrolaryngeal and normal speech for fundamental frequency prediction are limited by their reliance on transcriptions, thereby restricting real-time use. To overcome this, our system leverages phonetic posteriorgrams from an automatic speech recognition system. By transforming these phonetic posteriorgrams into clustered phoneme embeddings, we predict natural fundamental frequency patterns without requiring transcriptions. Our experiments demonstrate that this approach not only provides a robust, real-time solution for electrolaryngeal speech enhancement but also enables effective training with limited electrolaryngeal speech data and large, publicly available normal speech datasets.
ISSN:2169-3536