Predicting Fundamental Frequency Patterns in Electrolaryngeal Speech Using Automated Phoneme Extraction

We propose a system to enhance electrolaryngeal speech naturalness using automatically extracted phoneme representations. Phonemes provide sufficient information for predicting reasonably natural fundamental frequency patterns. Previous studies using forced-aligned phoneme labels to create shared fe...

Full description

Saved in:
Bibliographic Details
Main Authors: Mohammad Eshghi, Tomoki Toda
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10978849/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850191637966225408
author Mohammad Eshghi
Tomoki Toda
author_facet Mohammad Eshghi
Tomoki Toda
author_sort Mohammad Eshghi
collection DOAJ
description We propose a system to enhance electrolaryngeal speech naturalness using automatically extracted phoneme representations. Phonemes provide sufficient information for predicting reasonably natural fundamental frequency patterns. Previous studies using forced-aligned phoneme labels to create shared features between electrolaryngeal and normal speech for fundamental frequency prediction are limited by their reliance on transcriptions, thereby restricting real-time use. To overcome this, our system leverages phonetic posteriorgrams from an automatic speech recognition system. By transforming these phonetic posteriorgrams into clustered phoneme embeddings, we predict natural fundamental frequency patterns without requiring transcriptions. Our experiments demonstrate that this approach not only provides a robust, real-time solution for electrolaryngeal speech enhancement but also enables effective training with limited electrolaryngeal speech data and large, publicly available normal speech datasets.
format Article
id doaj-art-dd5f9901d84640a3afcdb89fc8f0ca23
institution OA Journals
issn 2169-3536
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-dd5f9901d84640a3afcdb89fc8f0ca232025-08-20T02:14:50ZengIEEEIEEE Access2169-35362025-01-0113738317384710.1109/ACCESS.2025.356464810978849Predicting Fundamental Frequency Patterns in Electrolaryngeal Speech Using Automated Phoneme ExtractionMohammad Eshghi0https://orcid.org/0000-0003-3878-6363Tomoki Toda1https://orcid.org/0000-0001-8146-1279Graduate School of Information Science, Nagoya University, Nagoya, JapanInformation Technology Center, Nagoya University, Nagoya, JapanWe propose a system to enhance electrolaryngeal speech naturalness using automatically extracted phoneme representations. Phonemes provide sufficient information for predicting reasonably natural fundamental frequency patterns. Previous studies using forced-aligned phoneme labels to create shared features between electrolaryngeal and normal speech for fundamental frequency prediction are limited by their reliance on transcriptions, thereby restricting real-time use. To overcome this, our system leverages phonetic posteriorgrams from an automatic speech recognition system. By transforming these phonetic posteriorgrams into clustered phoneme embeddings, we predict natural fundamental frequency patterns without requiring transcriptions. Our experiments demonstrate that this approach not only provides a robust, real-time solution for electrolaryngeal speech enhancement but also enables effective training with limited electrolaryngeal speech data and large, publicly available normal speech datasets.https://ieeexplore.ieee.org/document/10978849/Automatic speech recognitionelectrolaryngeal speechforced alignmentfundamental frequency predictionk-means clusteringphoneme embeddings
spellingShingle Mohammad Eshghi
Tomoki Toda
Predicting Fundamental Frequency Patterns in Electrolaryngeal Speech Using Automated Phoneme Extraction
IEEE Access
Automatic speech recognition
electrolaryngeal speech
forced alignment
fundamental frequency prediction
k-means clustering
phoneme embeddings
title Predicting Fundamental Frequency Patterns in Electrolaryngeal Speech Using Automated Phoneme Extraction
title_full Predicting Fundamental Frequency Patterns in Electrolaryngeal Speech Using Automated Phoneme Extraction
title_fullStr Predicting Fundamental Frequency Patterns in Electrolaryngeal Speech Using Automated Phoneme Extraction
title_full_unstemmed Predicting Fundamental Frequency Patterns in Electrolaryngeal Speech Using Automated Phoneme Extraction
title_short Predicting Fundamental Frequency Patterns in Electrolaryngeal Speech Using Automated Phoneme Extraction
title_sort predicting fundamental frequency patterns in electrolaryngeal speech using automated phoneme extraction
topic Automatic speech recognition
electrolaryngeal speech
forced alignment
fundamental frequency prediction
k-means clustering
phoneme embeddings
url https://ieeexplore.ieee.org/document/10978849/
work_keys_str_mv AT mohammadeshghi predictingfundamentalfrequencypatternsinelectrolaryngealspeechusingautomatedphonemeextraction
AT tomokitoda predictingfundamentalfrequencypatternsinelectrolaryngealspeechusingautomatedphonemeextraction