Code-Switching ASR for Low-Resource Indic Languages: A Hindi-Marathi Case Study

This work examines the development of Automatic Speech Recognition (ASR) systems for low-resource languages, focusing on Hindi and Marathi, particularly in multilingual and code-switching environments. ASR systems, which convert spoken language into text, face significant challenges when applied to...

Full description

Saved in:

Bibliographic Details
Main Authors:	Hemant Palivela, Meera Narvekar, David Asirvatham, Shashi Bhushan, Vinay Rishiwal, Udit Agarwal
Format:	Article
Language:	English
Published:	IEEE 2025-01-01
Series:	IEEE Access
Subjects:	Code-switching automatic speech recognition low-resource languages Hindi Marathi Indic languages
Online Access:	https://ieeexplore.ieee.org/document/10835062/
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832592924662038528
author	Hemant Palivela Meera Narvekar David Asirvatham Shashi Bhushan Vinay Rishiwal Udit Agarwal
author_facet	Hemant Palivela Meera Narvekar David Asirvatham Shashi Bhushan Vinay Rishiwal Udit Agarwal
author_sort	Hemant Palivela
collection	DOAJ
description	This work examines the development of Automatic Speech Recognition (ASR) systems for low-resource languages, focusing on Hindi and Marathi, particularly in multilingual and code-switching environments. ASR systems, which convert spoken language into text, face significant challenges when applied to low-resource languages with limited data for training models. These challenges are exacerbated in multilingual settings, particularly during code-switching, where speakers alternate between languages within a conversation. This paper underscores the current state of ASR for Indic languages, highlighting linguistic complexities such as diverse sentence structures, phonetic variety, and frequent code-switching. Code-switching introduces additional challenges, as ASR systems must rapidly identify language boundaries and adapt to linguistic shifts. Present systems struggle to perform adequately with code-switched data due to the complexity of phonetic structures and the lack of comprehensive, annotated speech corpora. This work critically evaluates current methods and proposes improvements using modern deep-learning techniques to address the primary challenges in developing efficient ASR models for Hindi and Marathi. Moreover, performance comparisons of monolingual, bilingual, and multilingual ASR systems indicate that multilingual approaches are more effective in managing linguistic diversity. The efficacy of these systems can be evaluated using performance metrics such as the Phoneme Error Rate (PER) and the Word Error Rate (WER), which assess word recognition accuracy.
format	Article
id	doaj-art-56b5090052d74ece9be28d2393fef16c
institution	Kabale University
issn	2169-3536
language	English
publishDate	2025-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj-art-56b5090052d74ece9be28d2393fef16c2025-01-21T00:01:33ZengIEEEIEEE Access2169-35362025-01-01139171919810.1109/ACCESS.2025.352774510835062Code-Switching ASR for Low-Resource Indic Languages: A Hindi-Marathi Case StudyHemant Palivela0https://orcid.org/0000-0002-5040-6979Meera Narvekar1David Asirvatham2Shashi Bhushan3Vinay Rishiwal4https://orcid.org/0000-0003-2451-4949Udit Agarwal5https://orcid.org/0000-0003-4353-0274Damac Properties, LLC, Dubai, United Arab EmiratesDepartment of Computer Engineering, D. J. Sanghvi College of Engineering, Mumbai, IndiaAsia School of Business, Kuala Lumpur, MalaysiaDepartment of Computer and Information Sciences, Universiti Teknologi PETRONAS, Seri Iskandar, Perak, MalaysiaDepartment of CSIT, MJP Rohilkhand University, Bareilly, IndiaDepartment of CSA, RBMI Group of Institutions, Bareilly, IndiaThis work examines the development of Automatic Speech Recognition (ASR) systems for low-resource languages, focusing on Hindi and Marathi, particularly in multilingual and code-switching environments. ASR systems, which convert spoken language into text, face significant challenges when applied to low-resource languages with limited data for training models. These challenges are exacerbated in multilingual settings, particularly during code-switching, where speakers alternate between languages within a conversation. This paper underscores the current state of ASR for Indic languages, highlighting linguistic complexities such as diverse sentence structures, phonetic variety, and frequent code-switching. Code-switching introduces additional challenges, as ASR systems must rapidly identify language boundaries and adapt to linguistic shifts. Present systems struggle to perform adequately with code-switched data due to the complexity of phonetic structures and the lack of comprehensive, annotated speech corpora. This work critically evaluates current methods and proposes improvements using modern deep-learning techniques to address the primary challenges in developing efficient ASR models for Hindi and Marathi. Moreover, performance comparisons of monolingual, bilingual, and multilingual ASR systems indicate that multilingual approaches are more effective in managing linguistic diversity. The efficacy of these systems can be evaluated using performance metrics such as the Phoneme Error Rate (PER) and the Word Error Rate (WER), which assess word recognition accuracy.https://ieeexplore.ieee.org/document/10835062/Code-switchingautomatic speech recognitionlow-resource languagesHindiMarathiIndic languages
spellingShingle	Hemant Palivela Meera Narvekar David Asirvatham Shashi Bhushan Vinay Rishiwal Udit Agarwal Code-Switching ASR for Low-Resource Indic Languages: A Hindi-Marathi Case Study IEEE Access Code-switching automatic speech recognition low-resource languages Hindi Marathi Indic languages
title	Code-Switching ASR for Low-Resource Indic Languages: A Hindi-Marathi Case Study
title_full	Code-Switching ASR for Low-Resource Indic Languages: A Hindi-Marathi Case Study
title_fullStr	Code-Switching ASR for Low-Resource Indic Languages: A Hindi-Marathi Case Study
title_full_unstemmed	Code-Switching ASR for Low-Resource Indic Languages: A Hindi-Marathi Case Study
title_short	Code-Switching ASR for Low-Resource Indic Languages: A Hindi-Marathi Case Study
title_sort	code switching asr for low resource indic languages a hindi marathi case study
topic	Code-switching automatic speech recognition low-resource languages Hindi Marathi Indic languages
url	https://ieeexplore.ieee.org/document/10835062/
work_keys_str_mv	AT hemantpalivela codeswitchingasrforlowresourceindiclanguagesahindimarathicasestudy AT meeranarvekar codeswitchingasrforlowresourceindiclanguagesahindimarathicasestudy AT davidasirvatham codeswitchingasrforlowresourceindiclanguagesahindimarathicasestudy AT shashibhushan codeswitchingasrforlowresourceindiclanguagesahindimarathicasestudy AT vinayrishiwal codeswitchingasrforlowresourceindiclanguagesahindimarathicasestudy AT uditagarwal codeswitchingasrforlowresourceindiclanguagesahindimarathicasestudy

Code-Switching ASR for Low-Resource Indic Languages: A Hindi-Marathi Case Study

Similar Items