A review on speech recognition approaches and challenges for Portuguese: exploring the feasibility of fine-tuning large-scale end-to-end models

Abstract At present, automatic speech recognition has become an important bridge for human-computer interaction and is widely applied in multiple fields. The Portuguese speech recognition task is gradually receiving attention due to its unique language stance. However, the relatively scarce data res...

Full description

Saved in:
Bibliographic Details
Main Authors: Yan Li, Yapeng Wang, Lap Man Hoi, Dingcheng Yang, Sio-Kei Im
Format: Article
Language:English
Published: SpringerOpen 2025-01-01
Series:EURASIP Journal on Audio, Speech, and Music Processing
Subjects:
Online Access:https://doi.org/10.1186/s13636-024-00388-w
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832585532323921920
author Yan Li
Yapeng Wang
Lap Man Hoi
Dingcheng Yang
Sio-Kei Im
author_facet Yan Li
Yapeng Wang
Lap Man Hoi
Dingcheng Yang
Sio-Kei Im
author_sort Yan Li
collection DOAJ
description Abstract At present, automatic speech recognition has become an important bridge for human-computer interaction and is widely applied in multiple fields. The Portuguese speech recognition task is gradually receiving attention due to its unique language stance. However, the relatively scarce data resources have constrained the development and application of Portuguese speech recognition systems. The neglect of accent issues is also detrimental to the promotion of recognition systems. This study focuses on the research progress of end-to-end technology on Portuguese speech recognition task. It discusses relevant research from two directions: Brazilian Portuguese recognition and European Portuguese recognition, and organizes available corpus resources for potential researchers. Then, taking European Portuguese speech recognition as an example, it takes the Fairseq-S2T and Whisper as benchmarks tested on a 500-h European Portuguese dataset to estimate the performance of large-scale pre-trained models and fine-tuning techniques. Whisper obtained a WER of 5.11% which indicates that multilingual joint training can enhance the generalization ability. Finally, to the existing problems in Portuguese speech recognition, it explores future research directions, which provides new ideas for the next stage of research and system construction.
format Article
id doaj-art-eba8888c4d1c443091722512eab58292
institution Kabale University
issn 1687-4722
language English
publishDate 2025-01-01
publisher SpringerOpen
record_format Article
series EURASIP Journal on Audio, Speech, and Music Processing
spelling doaj-art-eba8888c4d1c443091722512eab582922025-01-26T12:46:09ZengSpringerOpenEURASIP Journal on Audio, Speech, and Music Processing1687-47222025-01-012025111310.1186/s13636-024-00388-wA review on speech recognition approaches and challenges for Portuguese: exploring the feasibility of fine-tuning large-scale end-to-end modelsYan Li0Yapeng Wang1Lap Man Hoi2Dingcheng Yang3Sio-Kei Im4Faculty of Applied Sciences, Macao Polytechnic UniversityFaculty of Applied Sciences, Macao Polytechnic UniversityFaculty of Applied Sciences, Macao Polytechnic UniversitySchool of Information Engineering, Nanchang UniversityMacao Polytechnic UniversityAbstract At present, automatic speech recognition has become an important bridge for human-computer interaction and is widely applied in multiple fields. The Portuguese speech recognition task is gradually receiving attention due to its unique language stance. However, the relatively scarce data resources have constrained the development and application of Portuguese speech recognition systems. The neglect of accent issues is also detrimental to the promotion of recognition systems. This study focuses on the research progress of end-to-end technology on Portuguese speech recognition task. It discusses relevant research from two directions: Brazilian Portuguese recognition and European Portuguese recognition, and organizes available corpus resources for potential researchers. Then, taking European Portuguese speech recognition as an example, it takes the Fairseq-S2T and Whisper as benchmarks tested on a 500-h European Portuguese dataset to estimate the performance of large-scale pre-trained models and fine-tuning techniques. Whisper obtained a WER of 5.11% which indicates that multilingual joint training can enhance the generalization ability. Finally, to the existing problems in Portuguese speech recognition, it explores future research directions, which provides new ideas for the next stage of research and system construction.https://doi.org/10.1186/s13636-024-00388-wPortuguese speech recognitionReviewEnd-to-end models
spellingShingle Yan Li
Yapeng Wang
Lap Man Hoi
Dingcheng Yang
Sio-Kei Im
A review on speech recognition approaches and challenges for Portuguese: exploring the feasibility of fine-tuning large-scale end-to-end models
EURASIP Journal on Audio, Speech, and Music Processing
Portuguese speech recognition
Review
End-to-end models
title A review on speech recognition approaches and challenges for Portuguese: exploring the feasibility of fine-tuning large-scale end-to-end models
title_full A review on speech recognition approaches and challenges for Portuguese: exploring the feasibility of fine-tuning large-scale end-to-end models
title_fullStr A review on speech recognition approaches and challenges for Portuguese: exploring the feasibility of fine-tuning large-scale end-to-end models
title_full_unstemmed A review on speech recognition approaches and challenges for Portuguese: exploring the feasibility of fine-tuning large-scale end-to-end models
title_short A review on speech recognition approaches and challenges for Portuguese: exploring the feasibility of fine-tuning large-scale end-to-end models
title_sort review on speech recognition approaches and challenges for portuguese exploring the feasibility of fine tuning large scale end to end models
topic Portuguese speech recognition
Review
End-to-end models
url https://doi.org/10.1186/s13636-024-00388-w
work_keys_str_mv AT yanli areviewonspeechrecognitionapproachesandchallengesforportugueseexploringthefeasibilityoffinetuninglargescaleendtoendmodels
AT yapengwang areviewonspeechrecognitionapproachesandchallengesforportugueseexploringthefeasibilityoffinetuninglargescaleendtoendmodels
AT lapmanhoi areviewonspeechrecognitionapproachesandchallengesforportugueseexploringthefeasibilityoffinetuninglargescaleendtoendmodels
AT dingchengyang areviewonspeechrecognitionapproachesandchallengesforportugueseexploringthefeasibilityoffinetuninglargescaleendtoendmodels
AT siokeiim areviewonspeechrecognitionapproachesandchallengesforportugueseexploringthefeasibilityoffinetuninglargescaleendtoendmodels
AT yanli reviewonspeechrecognitionapproachesandchallengesforportugueseexploringthefeasibilityoffinetuninglargescaleendtoendmodels
AT yapengwang reviewonspeechrecognitionapproachesandchallengesforportugueseexploringthefeasibilityoffinetuninglargescaleendtoendmodels
AT lapmanhoi reviewonspeechrecognitionapproachesandchallengesforportugueseexploringthefeasibilityoffinetuninglargescaleendtoendmodels
AT dingchengyang reviewonspeechrecognitionapproachesandchallengesforportugueseexploringthefeasibilityoffinetuninglargescaleendtoendmodels
AT siokeiim reviewonspeechrecognitionapproachesandchallengesforportugueseexploringthefeasibilityoffinetuninglargescaleendtoendmodels