Predicting Fundamental Frequency Patterns in Electrolaryngeal Speech Using Automated Phoneme Extraction
We propose a system to enhance electrolaryngeal speech naturalness using automatically extracted phoneme representations. Phonemes provide sufficient information for predicting reasonably natural fundamental frequency patterns. Previous studies using forced-aligned phoneme labels to create shared fe...
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/10978849/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850191637966225408 |
|---|---|
| author | Mohammad Eshghi Tomoki Toda |
| author_facet | Mohammad Eshghi Tomoki Toda |
| author_sort | Mohammad Eshghi |
| collection | DOAJ |
| description | We propose a system to enhance electrolaryngeal speech naturalness using automatically extracted phoneme representations. Phonemes provide sufficient information for predicting reasonably natural fundamental frequency patterns. Previous studies using forced-aligned phoneme labels to create shared features between electrolaryngeal and normal speech for fundamental frequency prediction are limited by their reliance on transcriptions, thereby restricting real-time use. To overcome this, our system leverages phonetic posteriorgrams from an automatic speech recognition system. By transforming these phonetic posteriorgrams into clustered phoneme embeddings, we predict natural fundamental frequency patterns without requiring transcriptions. Our experiments demonstrate that this approach not only provides a robust, real-time solution for electrolaryngeal speech enhancement but also enables effective training with limited electrolaryngeal speech data and large, publicly available normal speech datasets. |
| format | Article |
| id | doaj-art-dd5f9901d84640a3afcdb89fc8f0ca23 |
| institution | OA Journals |
| issn | 2169-3536 |
| language | English |
| publishDate | 2025-01-01 |
| publisher | IEEE |
| record_format | Article |
| series | IEEE Access |
| spelling | doaj-art-dd5f9901d84640a3afcdb89fc8f0ca232025-08-20T02:14:50ZengIEEEIEEE Access2169-35362025-01-0113738317384710.1109/ACCESS.2025.356464810978849Predicting Fundamental Frequency Patterns in Electrolaryngeal Speech Using Automated Phoneme ExtractionMohammad Eshghi0https://orcid.org/0000-0003-3878-6363Tomoki Toda1https://orcid.org/0000-0001-8146-1279Graduate School of Information Science, Nagoya University, Nagoya, JapanInformation Technology Center, Nagoya University, Nagoya, JapanWe propose a system to enhance electrolaryngeal speech naturalness using automatically extracted phoneme representations. Phonemes provide sufficient information for predicting reasonably natural fundamental frequency patterns. Previous studies using forced-aligned phoneme labels to create shared features between electrolaryngeal and normal speech for fundamental frequency prediction are limited by their reliance on transcriptions, thereby restricting real-time use. To overcome this, our system leverages phonetic posteriorgrams from an automatic speech recognition system. By transforming these phonetic posteriorgrams into clustered phoneme embeddings, we predict natural fundamental frequency patterns without requiring transcriptions. Our experiments demonstrate that this approach not only provides a robust, real-time solution for electrolaryngeal speech enhancement but also enables effective training with limited electrolaryngeal speech data and large, publicly available normal speech datasets.https://ieeexplore.ieee.org/document/10978849/Automatic speech recognitionelectrolaryngeal speechforced alignmentfundamental frequency predictionk-means clusteringphoneme embeddings |
| spellingShingle | Mohammad Eshghi Tomoki Toda Predicting Fundamental Frequency Patterns in Electrolaryngeal Speech Using Automated Phoneme Extraction IEEE Access Automatic speech recognition electrolaryngeal speech forced alignment fundamental frequency prediction k-means clustering phoneme embeddings |
| title | Predicting Fundamental Frequency Patterns in Electrolaryngeal Speech Using Automated Phoneme Extraction |
| title_full | Predicting Fundamental Frequency Patterns in Electrolaryngeal Speech Using Automated Phoneme Extraction |
| title_fullStr | Predicting Fundamental Frequency Patterns in Electrolaryngeal Speech Using Automated Phoneme Extraction |
| title_full_unstemmed | Predicting Fundamental Frequency Patterns in Electrolaryngeal Speech Using Automated Phoneme Extraction |
| title_short | Predicting Fundamental Frequency Patterns in Electrolaryngeal Speech Using Automated Phoneme Extraction |
| title_sort | predicting fundamental frequency patterns in electrolaryngeal speech using automated phoneme extraction |
| topic | Automatic speech recognition electrolaryngeal speech forced alignment fundamental frequency prediction k-means clustering phoneme embeddings |
| url | https://ieeexplore.ieee.org/document/10978849/ |
| work_keys_str_mv | AT mohammadeshghi predictingfundamentalfrequencypatternsinelectrolaryngealspeechusingautomatedphonemeextraction AT tomokitoda predictingfundamentalfrequencypatternsinelectrolaryngealspeechusingautomatedphonemeextraction |