Morphological and structural complexity analysis of low-resource English-Turkish language pair using neural machine translation models

Neural machine translation (NMT) has achieved remarkable success in high-resource language pairs; however, its effectiveness for morphologically rich and low-resource languages like Turkish remains underexplored. As a highly agglutinative and morphologically complex language with limited high-qualit...

Full description

Saved in:
Bibliographic Details
Main Authors: Mehmet Acı, Nisa Vuran Sarı, Çiğdem İnan Acı
Format: Article
Language:English
Published: PeerJ Inc. 2025-08-01
Series:PeerJ Computer Science
Subjects:
Online Access:https://peerj.com/articles/cs-3072.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849239873643347968
author Mehmet Acı
Nisa Vuran Sarı
Çiğdem İnan Acı
author_facet Mehmet Acı
Nisa Vuran Sarı
Çiğdem İnan Acı
author_sort Mehmet Acı
collection DOAJ
description Neural machine translation (NMT) has achieved remarkable success in high-resource language pairs; however, its effectiveness for morphologically rich and low-resource languages like Turkish remains underexplored. As a highly agglutinative and morphologically complex language with limited high-quality parallel data, Turkish serves as a representative case for evaluating NMT systems on low-resource and linguistically challenging settings. Its structural divergence from English makes it a critical testbed for assessing tokenization strategies, attention mechanisms, and model generalizability in neural translation. This study investigates the comparative performance of two prominent NMT paradigms—the Transformer architecture, and recurrent-based sequence-to-sequence (Seq2Seq) models with attention for both English-to-Turkish and Turkish-to-English translation. The models are evaluated under various configurations, including different tokenization strategies (Byte Pair Encoding (BPE) vs. Word Tokenization), attention mechanisms (Bahdanau and an exploratory hybrid mechanism combining Bahdanau and Scaled Dot-Product attention), and architectural depths (layer count and attention head number). Extensive experiments using automatic metrics such as BiLingual Evaluation Understudy (BLEU), Metric for Evaluation of Translation with Explicit ORdering (METEOR), and Translation Error Rate (TER) reveal that the Transformer model with three layers, eight attention heads, and BPE tokenization achieved the best performance, obtaining a BLEU score of 47.85 and METEOR score of 44.62 in the English-to-Turkish direction. Similar performance trends were observed in the reverse direction, indicating the model’s generalizability. These findings highlight the potential of carefully optimized Transformer-based NMT systems in handling the complexities of morphologically rich, low-resource languages like Turkish in both translation directions.
format Article
id doaj-art-4d24e3d1fafc4c0f8f6d5b7bfef13f8f
institution Kabale University
issn 2376-5992
language English
publishDate 2025-08-01
publisher PeerJ Inc.
record_format Article
series PeerJ Computer Science
spelling doaj-art-4d24e3d1fafc4c0f8f6d5b7bfef13f8f2025-08-20T04:00:49ZengPeerJ Inc.PeerJ Computer Science2376-59922025-08-0111e307210.7717/peerj-cs.3072Morphological and structural complexity analysis of low-resource English-Turkish language pair using neural machine translation modelsMehmet AcıNisa Vuran SarıÇiğdem İnan AcıNeural machine translation (NMT) has achieved remarkable success in high-resource language pairs; however, its effectiveness for morphologically rich and low-resource languages like Turkish remains underexplored. As a highly agglutinative and morphologically complex language with limited high-quality parallel data, Turkish serves as a representative case for evaluating NMT systems on low-resource and linguistically challenging settings. Its structural divergence from English makes it a critical testbed for assessing tokenization strategies, attention mechanisms, and model generalizability in neural translation. This study investigates the comparative performance of two prominent NMT paradigms—the Transformer architecture, and recurrent-based sequence-to-sequence (Seq2Seq) models with attention for both English-to-Turkish and Turkish-to-English translation. The models are evaluated under various configurations, including different tokenization strategies (Byte Pair Encoding (BPE) vs. Word Tokenization), attention mechanisms (Bahdanau and an exploratory hybrid mechanism combining Bahdanau and Scaled Dot-Product attention), and architectural depths (layer count and attention head number). Extensive experiments using automatic metrics such as BiLingual Evaluation Understudy (BLEU), Metric for Evaluation of Translation with Explicit ORdering (METEOR), and Translation Error Rate (TER) reveal that the Transformer model with three layers, eight attention heads, and BPE tokenization achieved the best performance, obtaining a BLEU score of 47.85 and METEOR score of 44.62 in the English-to-Turkish direction. Similar performance trends were observed in the reverse direction, indicating the model’s generalizability. These findings highlight the potential of carefully optimized Transformer-based NMT systems in handling the complexities of morphologically rich, low-resource languages like Turkish in both translation directions.https://peerj.com/articles/cs-3072.pdfAttentionNeural machine translationTransformerSequence-to-sequenceGRUTurkish
spellingShingle Mehmet Acı
Nisa Vuran Sarı
Çiğdem İnan Acı
Morphological and structural complexity analysis of low-resource English-Turkish language pair using neural machine translation models
PeerJ Computer Science
Attention
Neural machine translation
Transformer
Sequence-to-sequence
GRU
Turkish
title Morphological and structural complexity analysis of low-resource English-Turkish language pair using neural machine translation models
title_full Morphological and structural complexity analysis of low-resource English-Turkish language pair using neural machine translation models
title_fullStr Morphological and structural complexity analysis of low-resource English-Turkish language pair using neural machine translation models
title_full_unstemmed Morphological and structural complexity analysis of low-resource English-Turkish language pair using neural machine translation models
title_short Morphological and structural complexity analysis of low-resource English-Turkish language pair using neural machine translation models
title_sort morphological and structural complexity analysis of low resource english turkish language pair using neural machine translation models
topic Attention
Neural machine translation
Transformer
Sequence-to-sequence
GRU
Turkish
url https://peerj.com/articles/cs-3072.pdf
work_keys_str_mv AT mehmetacı morphologicalandstructuralcomplexityanalysisoflowresourceenglishturkishlanguagepairusingneuralmachinetranslationmodels
AT nisavuransarı morphologicalandstructuralcomplexityanalysisoflowresourceenglishturkishlanguagepairusingneuralmachinetranslationmodels
AT cigdeminanacı morphologicalandstructuralcomplexityanalysisoflowresourceenglishturkishlanguagepairusingneuralmachinetranslationmodels