Code-Switching ASR for Low-Resource Indic Languages: A Hindi-Marathi Case Study

This work examines the development of Automatic Speech Recognition (ASR) systems for low-resource languages, focusing on Hindi and Marathi, particularly in multilingual and code-switching environments. ASR systems, which convert spoken language into text, face significant challenges when applied to...

Full description

Saved in:
Bibliographic Details
Main Authors: Hemant Palivela, Meera Narvekar, David Asirvatham, Shashi Bhushan, Vinay Rishiwal, Udit Agarwal
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10835062/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832592924662038528
author Hemant Palivela
Meera Narvekar
David Asirvatham
Shashi Bhushan
Vinay Rishiwal
Udit Agarwal
author_facet Hemant Palivela
Meera Narvekar
David Asirvatham
Shashi Bhushan
Vinay Rishiwal
Udit Agarwal
author_sort Hemant Palivela
collection DOAJ
description This work examines the development of Automatic Speech Recognition (ASR) systems for low-resource languages, focusing on Hindi and Marathi, particularly in multilingual and code-switching environments. ASR systems, which convert spoken language into text, face significant challenges when applied to low-resource languages with limited data for training models. These challenges are exacerbated in multilingual settings, particularly during code-switching, where speakers alternate between languages within a conversation. This paper underscores the current state of ASR for Indic languages, highlighting linguistic complexities such as diverse sentence structures, phonetic variety, and frequent code-switching. Code-switching introduces additional challenges, as ASR systems must rapidly identify language boundaries and adapt to linguistic shifts. Present systems struggle to perform adequately with code-switched data due to the complexity of phonetic structures and the lack of comprehensive, annotated speech corpora. This work critically evaluates current methods and proposes improvements using modern deep-learning techniques to address the primary challenges in developing efficient ASR models for Hindi and Marathi. Moreover, performance comparisons of monolingual, bilingual, and multilingual ASR systems indicate that multilingual approaches are more effective in managing linguistic diversity. The efficacy of these systems can be evaluated using performance metrics such as the Phoneme Error Rate (PER) and the Word Error Rate (WER), which assess word recognition accuracy.
format Article
id doaj-art-56b5090052d74ece9be28d2393fef16c
institution Kabale University
issn 2169-3536
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-56b5090052d74ece9be28d2393fef16c2025-01-21T00:01:33ZengIEEEIEEE Access2169-35362025-01-01139171919810.1109/ACCESS.2025.352774510835062Code-Switching ASR for Low-Resource Indic Languages: A Hindi-Marathi Case StudyHemant Palivela0https://orcid.org/0000-0002-5040-6979Meera Narvekar1David Asirvatham2Shashi Bhushan3Vinay Rishiwal4https://orcid.org/0000-0003-2451-4949Udit Agarwal5https://orcid.org/0000-0003-4353-0274Damac Properties, LLC, Dubai, United Arab EmiratesDepartment of Computer Engineering, D. J. Sanghvi College of Engineering, Mumbai, IndiaAsia School of Business, Kuala Lumpur, MalaysiaDepartment of Computer and Information Sciences, Universiti Teknologi PETRONAS, Seri Iskandar, Perak, MalaysiaDepartment of CSIT, MJP Rohilkhand University, Bareilly, IndiaDepartment of CSA, RBMI Group of Institutions, Bareilly, IndiaThis work examines the development of Automatic Speech Recognition (ASR) systems for low-resource languages, focusing on Hindi and Marathi, particularly in multilingual and code-switching environments. ASR systems, which convert spoken language into text, face significant challenges when applied to low-resource languages with limited data for training models. These challenges are exacerbated in multilingual settings, particularly during code-switching, where speakers alternate between languages within a conversation. This paper underscores the current state of ASR for Indic languages, highlighting linguistic complexities such as diverse sentence structures, phonetic variety, and frequent code-switching. Code-switching introduces additional challenges, as ASR systems must rapidly identify language boundaries and adapt to linguistic shifts. Present systems struggle to perform adequately with code-switched data due to the complexity of phonetic structures and the lack of comprehensive, annotated speech corpora. This work critically evaluates current methods and proposes improvements using modern deep-learning techniques to address the primary challenges in developing efficient ASR models for Hindi and Marathi. Moreover, performance comparisons of monolingual, bilingual, and multilingual ASR systems indicate that multilingual approaches are more effective in managing linguistic diversity. The efficacy of these systems can be evaluated using performance metrics such as the Phoneme Error Rate (PER) and the Word Error Rate (WER), which assess word recognition accuracy.https://ieeexplore.ieee.org/document/10835062/Code-switchingautomatic speech recognitionlow-resource languagesHindiMarathiIndic languages
spellingShingle Hemant Palivela
Meera Narvekar
David Asirvatham
Shashi Bhushan
Vinay Rishiwal
Udit Agarwal
Code-Switching ASR for Low-Resource Indic Languages: A Hindi-Marathi Case Study
IEEE Access
Code-switching
automatic speech recognition
low-resource languages
Hindi
Marathi
Indic languages
title Code-Switching ASR for Low-Resource Indic Languages: A Hindi-Marathi Case Study
title_full Code-Switching ASR for Low-Resource Indic Languages: A Hindi-Marathi Case Study
title_fullStr Code-Switching ASR for Low-Resource Indic Languages: A Hindi-Marathi Case Study
title_full_unstemmed Code-Switching ASR for Low-Resource Indic Languages: A Hindi-Marathi Case Study
title_short Code-Switching ASR for Low-Resource Indic Languages: A Hindi-Marathi Case Study
title_sort code switching asr for low resource indic languages a hindi marathi case study
topic Code-switching
automatic speech recognition
low-resource languages
Hindi
Marathi
Indic languages
url https://ieeexplore.ieee.org/document/10835062/
work_keys_str_mv AT hemantpalivela codeswitchingasrforlowresourceindiclanguagesahindimarathicasestudy
AT meeranarvekar codeswitchingasrforlowresourceindiclanguagesahindimarathicasestudy
AT davidasirvatham codeswitchingasrforlowresourceindiclanguagesahindimarathicasestudy
AT shashibhushan codeswitchingasrforlowresourceindiclanguagesahindimarathicasestudy
AT vinayrishiwal codeswitchingasrforlowresourceindiclanguagesahindimarathicasestudy
AT uditagarwal codeswitchingasrforlowresourceindiclanguagesahindimarathicasestudy