A Novel Approach to Continual Knowledge Transfer in Multilingual Neural Machine Translation Using Autoregressive and Non-Autoregressive Models for Indic Languages
Recent progress in multilingual pre-trained models has significantly improved translation quality for Indic languages. However, extending these models to new languages via fine-tuning or retraining remains computationally costly and often leads to parameter interference, degrading performance on pre...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/11005970/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Recent progress in multilingual pre-trained models has significantly improved translation quality for Indic languages. However, extending these models to new languages via fine-tuning or retraining remains computationally costly and often leads to parameter interference, degrading performance on previously learned or typologically distant languages. While continual learning offers a promising alternative for incremental language addition, its application in Indic contexts is still limited and faces challenges in generalization across diverse linguistic settings. To overcome these issues, we propose a Continual Knowledge Transfer (CKT) framework for efficient and scalable multilingual adaptation. CKT is realized in both autoregressive (MNMT) and non-autoregressive (Switch-GLAT) architectures, yielding two variants: MNMT+CKT and Switch-GLAT+CKT. Rather than retraining the entire model, CKT freezes the multilingual base and updates only parameters relevant to the newly added language. Key innovations include gradient-based knowledge pruning, sequential teacher integration, and dynamic vocabulary expansion for minimizing interference and maximizing cross-lingual retention. Comprehensive evaluations on the IN22-Conv and IN22-Gen benchmark datasets demonstrate that both MNMT+CKT and Switch-GLAT+CKT consistently outperform established baselines, such as IndicTrans2, Google Translate, GPT-4-32K, LLaMA-2-17B, and NLIP-LAB-IITH. The proposed multi-step distillation approach, MNMT+CKT, consistently outperforms conventional fine-tuning and Knowledge Transfer (MNMT+KT) strategies for incremental adaptation of linguistically diverse Indic languages. On IN22-Conv, BLEU improvements range from +4.93 (Kashmiri <inline-formula> <tex-math notation="LaTeX">$\rightarrow $ </tex-math></inline-formula> English) to +11.48 (Assamese <inline-formula> <tex-math notation="LaTeX">$\rightarrow $ </tex-math></inline-formula> English) and similar improvements are seen for IN22-Gen. The method also achieves substantial reductions in trainable parameters—19.80% (Nepali) to 66.87% (Kannada)—while enabling up to <inline-formula> <tex-math notation="LaTeX">$4x$ </tex-math></inline-formula> faster inference when integrated with the Switch-GLAT architecture. Among the two, Switch- GLAT+CKT achieves the highest BLEU scores across all language pairs. In the English <inline-formula> <tex-math notation="LaTeX">$\rightarrow $ </tex-math></inline-formula> Indic translation direction, BLEU gains range from +9.12 (Kannada) to +26.00 (Nepali), while in the Indic <inline-formula> <tex-math notation="LaTeX">$\rightarrow $ </tex-math></inline-formula> English direction, gains range from +0.07 (Odia) to +3.78 (Assamese). Furthermore, ablation studies and sequential integration of multilingual teacher models reveal that CKT significantly reduces the number of trainable parameters required for each incremental step. |
|---|---|
| ISSN: | 2169-3536 |