Code-Switching ASR for Low-Resource Indic Languages: A Hindi-Marathi Case Study
This work examines the development of Automatic Speech Recognition (ASR) systems for low-resource languages, focusing on Hindi and Marathi, particularly in multilingual and code-switching environments. ASR systems, which convert spoken language into text, face significant challenges when applied to...
Saved in:
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2025-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/10835062/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832592924662038528 |
---|---|
author | Hemant Palivela Meera Narvekar David Asirvatham Shashi Bhushan Vinay Rishiwal Udit Agarwal |
author_facet | Hemant Palivela Meera Narvekar David Asirvatham Shashi Bhushan Vinay Rishiwal Udit Agarwal |
author_sort | Hemant Palivela |
collection | DOAJ |
description | This work examines the development of Automatic Speech Recognition (ASR) systems for low-resource languages, focusing on Hindi and Marathi, particularly in multilingual and code-switching environments. ASR systems, which convert spoken language into text, face significant challenges when applied to low-resource languages with limited data for training models. These challenges are exacerbated in multilingual settings, particularly during code-switching, where speakers alternate between languages within a conversation. This paper underscores the current state of ASR for Indic languages, highlighting linguistic complexities such as diverse sentence structures, phonetic variety, and frequent code-switching. Code-switching introduces additional challenges, as ASR systems must rapidly identify language boundaries and adapt to linguistic shifts. Present systems struggle to perform adequately with code-switched data due to the complexity of phonetic structures and the lack of comprehensive, annotated speech corpora. This work critically evaluates current methods and proposes improvements using modern deep-learning techniques to address the primary challenges in developing efficient ASR models for Hindi and Marathi. Moreover, performance comparisons of monolingual, bilingual, and multilingual ASR systems indicate that multilingual approaches are more effective in managing linguistic diversity. The efficacy of these systems can be evaluated using performance metrics such as the Phoneme Error Rate (PER) and the Word Error Rate (WER), which assess word recognition accuracy. |
format | Article |
id | doaj-art-56b5090052d74ece9be28d2393fef16c |
institution | Kabale University |
issn | 2169-3536 |
language | English |
publishDate | 2025-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj-art-56b5090052d74ece9be28d2393fef16c2025-01-21T00:01:33ZengIEEEIEEE Access2169-35362025-01-01139171919810.1109/ACCESS.2025.352774510835062Code-Switching ASR for Low-Resource Indic Languages: A Hindi-Marathi Case StudyHemant Palivela0https://orcid.org/0000-0002-5040-6979Meera Narvekar1David Asirvatham2Shashi Bhushan3Vinay Rishiwal4https://orcid.org/0000-0003-2451-4949Udit Agarwal5https://orcid.org/0000-0003-4353-0274Damac Properties, LLC, Dubai, United Arab EmiratesDepartment of Computer Engineering, D. J. Sanghvi College of Engineering, Mumbai, IndiaAsia School of Business, Kuala Lumpur, MalaysiaDepartment of Computer and Information Sciences, Universiti Teknologi PETRONAS, Seri Iskandar, Perak, MalaysiaDepartment of CSIT, MJP Rohilkhand University, Bareilly, IndiaDepartment of CSA, RBMI Group of Institutions, Bareilly, IndiaThis work examines the development of Automatic Speech Recognition (ASR) systems for low-resource languages, focusing on Hindi and Marathi, particularly in multilingual and code-switching environments. ASR systems, which convert spoken language into text, face significant challenges when applied to low-resource languages with limited data for training models. These challenges are exacerbated in multilingual settings, particularly during code-switching, where speakers alternate between languages within a conversation. This paper underscores the current state of ASR for Indic languages, highlighting linguistic complexities such as diverse sentence structures, phonetic variety, and frequent code-switching. Code-switching introduces additional challenges, as ASR systems must rapidly identify language boundaries and adapt to linguistic shifts. Present systems struggle to perform adequately with code-switched data due to the complexity of phonetic structures and the lack of comprehensive, annotated speech corpora. This work critically evaluates current methods and proposes improvements using modern deep-learning techniques to address the primary challenges in developing efficient ASR models for Hindi and Marathi. Moreover, performance comparisons of monolingual, bilingual, and multilingual ASR systems indicate that multilingual approaches are more effective in managing linguistic diversity. The efficacy of these systems can be evaluated using performance metrics such as the Phoneme Error Rate (PER) and the Word Error Rate (WER), which assess word recognition accuracy.https://ieeexplore.ieee.org/document/10835062/Code-switchingautomatic speech recognitionlow-resource languagesHindiMarathiIndic languages |
spellingShingle | Hemant Palivela Meera Narvekar David Asirvatham Shashi Bhushan Vinay Rishiwal Udit Agarwal Code-Switching ASR for Low-Resource Indic Languages: A Hindi-Marathi Case Study IEEE Access Code-switching automatic speech recognition low-resource languages Hindi Marathi Indic languages |
title | Code-Switching ASR for Low-Resource Indic Languages: A Hindi-Marathi Case Study |
title_full | Code-Switching ASR for Low-Resource Indic Languages: A Hindi-Marathi Case Study |
title_fullStr | Code-Switching ASR for Low-Resource Indic Languages: A Hindi-Marathi Case Study |
title_full_unstemmed | Code-Switching ASR for Low-Resource Indic Languages: A Hindi-Marathi Case Study |
title_short | Code-Switching ASR for Low-Resource Indic Languages: A Hindi-Marathi Case Study |
title_sort | code switching asr for low resource indic languages a hindi marathi case study |
topic | Code-switching automatic speech recognition low-resource languages Hindi Marathi Indic languages |
url | https://ieeexplore.ieee.org/document/10835062/ |
work_keys_str_mv | AT hemantpalivela codeswitchingasrforlowresourceindiclanguagesahindimarathicasestudy AT meeranarvekar codeswitchingasrforlowresourceindiclanguagesahindimarathicasestudy AT davidasirvatham codeswitchingasrforlowresourceindiclanguagesahindimarathicasestudy AT shashibhushan codeswitchingasrforlowresourceindiclanguagesahindimarathicasestudy AT vinayrishiwal codeswitchingasrforlowresourceindiclanguagesahindimarathicasestudy AT uditagarwal codeswitchingasrforlowresourceindiclanguagesahindimarathicasestudy |