A review on speech recognition approaches and challenges for Portuguese: exploring the feasibility of fine-tuning large-scale end-to-end models

Abstract At present, automatic speech recognition has become an important bridge for human-computer interaction and is widely applied in multiple fields. The Portuguese speech recognition task is gradually receiving attention due to its unique language stance. However, the relatively scarce data res...

Full description

Saved in:

Bibliographic Details
Main Authors:	Yan Li, Yapeng Wang, Lap Man Hoi, Dingcheng Yang, Sio-Kei Im
Format:	Article
Language:	English
Published:	SpringerOpen 2025-01-01
Series:	EURASIP Journal on Audio, Speech, and Music Processing
Subjects:	Portuguese speech recognition Review End-to-end models
Online Access:	https://doi.org/10.1186/s13636-024-00388-w
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832585532323921920
author	Yan Li Yapeng Wang Lap Man Hoi Dingcheng Yang Sio-Kei Im
author_facet	Yan Li Yapeng Wang Lap Man Hoi Dingcheng Yang Sio-Kei Im
author_sort	Yan Li
collection	DOAJ
description	Abstract At present, automatic speech recognition has become an important bridge for human-computer interaction and is widely applied in multiple fields. The Portuguese speech recognition task is gradually receiving attention due to its unique language stance. However, the relatively scarce data resources have constrained the development and application of Portuguese speech recognition systems. The neglect of accent issues is also detrimental to the promotion of recognition systems. This study focuses on the research progress of end-to-end technology on Portuguese speech recognition task. It discusses relevant research from two directions: Brazilian Portuguese recognition and European Portuguese recognition, and organizes available corpus resources for potential researchers. Then, taking European Portuguese speech recognition as an example, it takes the Fairseq-S2T and Whisper as benchmarks tested on a 500-h European Portuguese dataset to estimate the performance of large-scale pre-trained models and fine-tuning techniques. Whisper obtained a WER of 5.11% which indicates that multilingual joint training can enhance the generalization ability. Finally, to the existing problems in Portuguese speech recognition, it explores future research directions, which provides new ideas for the next stage of research and system construction.
format	Article
id	doaj-art-eba8888c4d1c443091722512eab58292
institution	Kabale University
issn	1687-4722
language	English
publishDate	2025-01-01
publisher	SpringerOpen
record_format	Article
series	EURASIP Journal on Audio, Speech, and Music Processing
spelling	doaj-art-eba8888c4d1c443091722512eab582922025-01-26T12:46:09ZengSpringerOpenEURASIP Journal on Audio, Speech, and Music Processing1687-47222025-01-012025111310.1186/s13636-024-00388-wA review on speech recognition approaches and challenges for Portuguese: exploring the feasibility of fine-tuning large-scale end-to-end modelsYan Li0Yapeng Wang1Lap Man Hoi2Dingcheng Yang3Sio-Kei Im4Faculty of Applied Sciences, Macao Polytechnic UniversityFaculty of Applied Sciences, Macao Polytechnic UniversityFaculty of Applied Sciences, Macao Polytechnic UniversitySchool of Information Engineering, Nanchang UniversityMacao Polytechnic UniversityAbstract At present, automatic speech recognition has become an important bridge for human-computer interaction and is widely applied in multiple fields. The Portuguese speech recognition task is gradually receiving attention due to its unique language stance. However, the relatively scarce data resources have constrained the development and application of Portuguese speech recognition systems. The neglect of accent issues is also detrimental to the promotion of recognition systems. This study focuses on the research progress of end-to-end technology on Portuguese speech recognition task. It discusses relevant research from two directions: Brazilian Portuguese recognition and European Portuguese recognition, and organizes available corpus resources for potential researchers. Then, taking European Portuguese speech recognition as an example, it takes the Fairseq-S2T and Whisper as benchmarks tested on a 500-h European Portuguese dataset to estimate the performance of large-scale pre-trained models and fine-tuning techniques. Whisper obtained a WER of 5.11% which indicates that multilingual joint training can enhance the generalization ability. Finally, to the existing problems in Portuguese speech recognition, it explores future research directions, which provides new ideas for the next stage of research and system construction.https://doi.org/10.1186/s13636-024-00388-wPortuguese speech recognitionReviewEnd-to-end models
spellingShingle	Yan Li Yapeng Wang Lap Man Hoi Dingcheng Yang Sio-Kei Im A review on speech recognition approaches and challenges for Portuguese: exploring the feasibility of fine-tuning large-scale end-to-end models EURASIP Journal on Audio, Speech, and Music Processing Portuguese speech recognition Review End-to-end models
title	A review on speech recognition approaches and challenges for Portuguese: exploring the feasibility of fine-tuning large-scale end-to-end models
title_full	A review on speech recognition approaches and challenges for Portuguese: exploring the feasibility of fine-tuning large-scale end-to-end models
title_fullStr	A review on speech recognition approaches and challenges for Portuguese: exploring the feasibility of fine-tuning large-scale end-to-end models
title_full_unstemmed	A review on speech recognition approaches and challenges for Portuguese: exploring the feasibility of fine-tuning large-scale end-to-end models
title_short	A review on speech recognition approaches and challenges for Portuguese: exploring the feasibility of fine-tuning large-scale end-to-end models
title_sort	review on speech recognition approaches and challenges for portuguese exploring the feasibility of fine tuning large scale end to end models
topic	Portuguese speech recognition Review End-to-end models
url	https://doi.org/10.1186/s13636-024-00388-w
work_keys_str_mv	AT yanli areviewonspeechrecognitionapproachesandchallengesforportugueseexploringthefeasibilityoffinetuninglargescaleendtoendmodels AT yapengwang areviewonspeechrecognitionapproachesandchallengesforportugueseexploringthefeasibilityoffinetuninglargescaleendtoendmodels AT lapmanhoi areviewonspeechrecognitionapproachesandchallengesforportugueseexploringthefeasibilityoffinetuninglargescaleendtoendmodels AT dingchengyang areviewonspeechrecognitionapproachesandchallengesforportugueseexploringthefeasibilityoffinetuninglargescaleendtoendmodels AT siokeiim areviewonspeechrecognitionapproachesandchallengesforportugueseexploringthefeasibilityoffinetuninglargescaleendtoendmodels AT yanli reviewonspeechrecognitionapproachesandchallengesforportugueseexploringthefeasibilityoffinetuninglargescaleendtoendmodels AT yapengwang reviewonspeechrecognitionapproachesandchallengesforportugueseexploringthefeasibilityoffinetuninglargescaleendtoendmodels AT lapmanhoi reviewonspeechrecognitionapproachesandchallengesforportugueseexploringthefeasibilityoffinetuninglargescaleendtoendmodels AT dingchengyang reviewonspeechrecognitionapproachesandchallengesforportugueseexploringthefeasibilityoffinetuninglargescaleendtoendmodels AT siokeiim reviewonspeechrecognitionapproachesandchallengesforportugueseexploringthefeasibilityoffinetuninglargescaleendtoendmodels

A review on speech recognition approaches and challenges for Portuguese: exploring the feasibility of fine-tuning large-scale end-to-end models

Similar Items