A Novel Approach to Continual Knowledge Transfer in Multilingual Neural Machine Translation Using Autoregressive and Non-Autoregressive Models for Indic Languages

Recent progress in multilingual pre-trained models has significantly improved translation quality for Indic languages. However, extending these models to new languages via fine-tuning or retraining remains computationally costly and often leads to parameter interference, degrading performance on pre...

Full description

Saved in:
Bibliographic Details
Main Authors: Shailashree K. Sheshadri, Deepa Gupta, Biswajit Paul, J. Siva Bhavani
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/11005970/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Recent progress in multilingual pre-trained models has significantly improved translation quality for Indic languages. However, extending these models to new languages via fine-tuning or retraining remains computationally costly and often leads to parameter interference, degrading performance on previously learned or typologically distant languages. While continual learning offers a promising alternative for incremental language addition, its application in Indic contexts is still limited and faces challenges in generalization across diverse linguistic settings. To overcome these issues, we propose a Continual Knowledge Transfer (CKT) framework for efficient and scalable multilingual adaptation. CKT is realized in both autoregressive (MNMT) and non-autoregressive (Switch-GLAT) architectures, yielding two variants: MNMT+CKT and Switch-GLAT+CKT. Rather than retraining the entire model, CKT freezes the multilingual base and updates only parameters relevant to the newly added language. Key innovations include gradient-based knowledge pruning, sequential teacher integration, and dynamic vocabulary expansion for minimizing interference and maximizing cross-lingual retention. Comprehensive evaluations on the IN22-Conv and IN22-Gen benchmark datasets demonstrate that both MNMT+CKT and Switch-GLAT+CKT consistently outperform established baselines, such as IndicTrans2, Google Translate, GPT-4-32K, LLaMA-2-17B, and NLIP-LAB-IITH. The proposed multi-step distillation approach, MNMT+CKT, consistently outperforms conventional fine-tuning and Knowledge Transfer (MNMT+KT) strategies for incremental adaptation of linguistically diverse Indic languages. On IN22-Conv, BLEU improvements range from +4.93 (Kashmiri <inline-formula> <tex-math notation="LaTeX">$\rightarrow $ </tex-math></inline-formula> English) to +11.48 (Assamese <inline-formula> <tex-math notation="LaTeX">$\rightarrow $ </tex-math></inline-formula> English) and similar improvements are seen for IN22-Gen. The method also achieves substantial reductions in trainable parameters&#x2014;19.80% (Nepali) to 66.87% (Kannada)&#x2014;while enabling up to <inline-formula> <tex-math notation="LaTeX">$4x$ </tex-math></inline-formula> faster inference when integrated with the Switch-GLAT architecture. Among the two, Switch- GLAT+CKT achieves the highest BLEU scores across all language pairs. In the English <inline-formula> <tex-math notation="LaTeX">$\rightarrow $ </tex-math></inline-formula> Indic translation direction, BLEU gains range from +9.12 (Kannada) to +26.00 (Nepali), while in the Indic <inline-formula> <tex-math notation="LaTeX">$\rightarrow $ </tex-math></inline-formula> English direction, gains range from +0.07 (Odia) to +3.78 (Assamese). Furthermore, ablation studies and sequential integration of multilingual teacher models reveal that CKT significantly reduces the number of trainable parameters required for each incremental step.
ISSN:2169-3536