A review on speech recognition approaches and challenges for Portuguese: exploring the feasibility of fine-tuning large-scale end-to-end models
Abstract At present, automatic speech recognition has become an important bridge for human-computer interaction and is widely applied in multiple fields. The Portuguese speech recognition task is gradually receiving attention due to its unique language stance. However, the relatively scarce data res...
Saved in:
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
SpringerOpen
2025-01-01
|
Series: | EURASIP Journal on Audio, Speech, and Music Processing |
Subjects: | |
Online Access: | https://doi.org/10.1186/s13636-024-00388-w |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832585532323921920 |
---|---|
author | Yan Li Yapeng Wang Lap Man Hoi Dingcheng Yang Sio-Kei Im |
author_facet | Yan Li Yapeng Wang Lap Man Hoi Dingcheng Yang Sio-Kei Im |
author_sort | Yan Li |
collection | DOAJ |
description | Abstract At present, automatic speech recognition has become an important bridge for human-computer interaction and is widely applied in multiple fields. The Portuguese speech recognition task is gradually receiving attention due to its unique language stance. However, the relatively scarce data resources have constrained the development and application of Portuguese speech recognition systems. The neglect of accent issues is also detrimental to the promotion of recognition systems. This study focuses on the research progress of end-to-end technology on Portuguese speech recognition task. It discusses relevant research from two directions: Brazilian Portuguese recognition and European Portuguese recognition, and organizes available corpus resources for potential researchers. Then, taking European Portuguese speech recognition as an example, it takes the Fairseq-S2T and Whisper as benchmarks tested on a 500-h European Portuguese dataset to estimate the performance of large-scale pre-trained models and fine-tuning techniques. Whisper obtained a WER of 5.11% which indicates that multilingual joint training can enhance the generalization ability. Finally, to the existing problems in Portuguese speech recognition, it explores future research directions, which provides new ideas for the next stage of research and system construction. |
format | Article |
id | doaj-art-eba8888c4d1c443091722512eab58292 |
institution | Kabale University |
issn | 1687-4722 |
language | English |
publishDate | 2025-01-01 |
publisher | SpringerOpen |
record_format | Article |
series | EURASIP Journal on Audio, Speech, and Music Processing |
spelling | doaj-art-eba8888c4d1c443091722512eab582922025-01-26T12:46:09ZengSpringerOpenEURASIP Journal on Audio, Speech, and Music Processing1687-47222025-01-012025111310.1186/s13636-024-00388-wA review on speech recognition approaches and challenges for Portuguese: exploring the feasibility of fine-tuning large-scale end-to-end modelsYan Li0Yapeng Wang1Lap Man Hoi2Dingcheng Yang3Sio-Kei Im4Faculty of Applied Sciences, Macao Polytechnic UniversityFaculty of Applied Sciences, Macao Polytechnic UniversityFaculty of Applied Sciences, Macao Polytechnic UniversitySchool of Information Engineering, Nanchang UniversityMacao Polytechnic UniversityAbstract At present, automatic speech recognition has become an important bridge for human-computer interaction and is widely applied in multiple fields. The Portuguese speech recognition task is gradually receiving attention due to its unique language stance. However, the relatively scarce data resources have constrained the development and application of Portuguese speech recognition systems. The neglect of accent issues is also detrimental to the promotion of recognition systems. This study focuses on the research progress of end-to-end technology on Portuguese speech recognition task. It discusses relevant research from two directions: Brazilian Portuguese recognition and European Portuguese recognition, and organizes available corpus resources for potential researchers. Then, taking European Portuguese speech recognition as an example, it takes the Fairseq-S2T and Whisper as benchmarks tested on a 500-h European Portuguese dataset to estimate the performance of large-scale pre-trained models and fine-tuning techniques. Whisper obtained a WER of 5.11% which indicates that multilingual joint training can enhance the generalization ability. Finally, to the existing problems in Portuguese speech recognition, it explores future research directions, which provides new ideas for the next stage of research and system construction.https://doi.org/10.1186/s13636-024-00388-wPortuguese speech recognitionReviewEnd-to-end models |
spellingShingle | Yan Li Yapeng Wang Lap Man Hoi Dingcheng Yang Sio-Kei Im A review on speech recognition approaches and challenges for Portuguese: exploring the feasibility of fine-tuning large-scale end-to-end models EURASIP Journal on Audio, Speech, and Music Processing Portuguese speech recognition Review End-to-end models |
title | A review on speech recognition approaches and challenges for Portuguese: exploring the feasibility of fine-tuning large-scale end-to-end models |
title_full | A review on speech recognition approaches and challenges for Portuguese: exploring the feasibility of fine-tuning large-scale end-to-end models |
title_fullStr | A review on speech recognition approaches and challenges for Portuguese: exploring the feasibility of fine-tuning large-scale end-to-end models |
title_full_unstemmed | A review on speech recognition approaches and challenges for Portuguese: exploring the feasibility of fine-tuning large-scale end-to-end models |
title_short | A review on speech recognition approaches and challenges for Portuguese: exploring the feasibility of fine-tuning large-scale end-to-end models |
title_sort | review on speech recognition approaches and challenges for portuguese exploring the feasibility of fine tuning large scale end to end models |
topic | Portuguese speech recognition Review End-to-end models |
url | https://doi.org/10.1186/s13636-024-00388-w |
work_keys_str_mv | AT yanli areviewonspeechrecognitionapproachesandchallengesforportugueseexploringthefeasibilityoffinetuninglargescaleendtoendmodels AT yapengwang areviewonspeechrecognitionapproachesandchallengesforportugueseexploringthefeasibilityoffinetuninglargescaleendtoendmodels AT lapmanhoi areviewonspeechrecognitionapproachesandchallengesforportugueseexploringthefeasibilityoffinetuninglargescaleendtoendmodels AT dingchengyang areviewonspeechrecognitionapproachesandchallengesforportugueseexploringthefeasibilityoffinetuninglargescaleendtoendmodels AT siokeiim areviewonspeechrecognitionapproachesandchallengesforportugueseexploringthefeasibilityoffinetuninglargescaleendtoendmodels AT yanli reviewonspeechrecognitionapproachesandchallengesforportugueseexploringthefeasibilityoffinetuninglargescaleendtoendmodels AT yapengwang reviewonspeechrecognitionapproachesandchallengesforportugueseexploringthefeasibilityoffinetuninglargescaleendtoendmodels AT lapmanhoi reviewonspeechrecognitionapproachesandchallengesforportugueseexploringthefeasibilityoffinetuninglargescaleendtoendmodels AT dingchengyang reviewonspeechrecognitionapproachesandchallengesforportugueseexploringthefeasibilityoffinetuninglargescaleendtoendmodels AT siokeiim reviewonspeechrecognitionapproachesandchallengesforportugueseexploringthefeasibilityoffinetuninglargescaleendtoendmodels |