Interval evaluation of temporal (in)stability for neural machine translation

Abstract Though neural machine translation (NMT) has become the leading machine translation (MT) paradigm, its output may still contain errors. To improve NMT quality, it is important to investigate these errors and to see how NMT quality changes with time. The primary focus of the paper is on what...

Full description

Saved in:
Bibliographic Details
Main Authors: Anna Egorova, Mikhail Kruzhkov, Vitaly Nuriev, Igor Zatsman
Format: Article
Language:English
Published: Springer 2025-01-01
Series:Discover Artificial Intelligence
Subjects:
Online Access:https://doi.org/10.1007/s44163-025-00222-y
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832585549205995520
author Anna Egorova
Mikhail Kruzhkov
Vitaly Nuriev
Igor Zatsman
author_facet Anna Egorova
Mikhail Kruzhkov
Vitaly Nuriev
Igor Zatsman
author_sort Anna Egorova
collection DOAJ
description Abstract Though neural machine translation (NMT) has become the leading machine translation (MT) paradigm, its output may still contain errors. To improve NMT quality, it is important to investigate these errors and to see how NMT quality changes with time. The primary focus of the paper is on what is referred to here as “temporal (in)stability of NMT”, the phenomenon that was uncovered in a year-long experiment and may be researched applying interval evaluation methods. The paper presents data collected while observing how far, if at all, the Google’s Neural Machine Translation (GNMT) system progressed during a year. The data were qualitatively evaluated based on a set of indicators. To that end, 250 Russian text sentences were chosen. In the course of a year, each sentence was repeatedly translated into French using the GNMT engine (with a time step of 1 month). The produced translations were recorded and annotated in an especially designed supracorpora database, allowing to register a series of 12 translations for each of the 250 Russian sentences. To annotate the translations, there was a need to elaborate an error typology that would help reveal if the NMT system improved its output quality or not. One year-long experiment shows that not only does NMT quality improve, but it also may decrease with time.
format Article
id doaj-art-1730881d007f47789051af3299af8e6e
institution Kabale University
issn 2731-0809
language English
publishDate 2025-01-01
publisher Springer
record_format Article
series Discover Artificial Intelligence
spelling doaj-art-1730881d007f47789051af3299af8e6e2025-01-26T12:43:00ZengSpringerDiscover Artificial Intelligence2731-08092025-01-015111710.1007/s44163-025-00222-yInterval evaluation of temporal (in)stability for neural machine translationAnna Egorova0Mikhail Kruzhkov1Vitaly Nuriev2Igor Zatsman3Institute of Informatics Problems, Federal Research Center Computer Science and Control of the Russian Academy of Sciences (FRC CSC RAS)Independent researcher Center for Emerging Practices, Institute of Scientific Information for Social Sciences of the Russian Academy of Sciences (INION RAN)Institute of Informatics Problems, Federal Research Center Computer Science and Control of the Russian Academy of Sciences (FRC CSC RAS)Abstract Though neural machine translation (NMT) has become the leading machine translation (MT) paradigm, its output may still contain errors. To improve NMT quality, it is important to investigate these errors and to see how NMT quality changes with time. The primary focus of the paper is on what is referred to here as “temporal (in)stability of NMT”, the phenomenon that was uncovered in a year-long experiment and may be researched applying interval evaluation methods. The paper presents data collected while observing how far, if at all, the Google’s Neural Machine Translation (GNMT) system progressed during a year. The data were qualitatively evaluated based on a set of indicators. To that end, 250 Russian text sentences were chosen. In the course of a year, each sentence was repeatedly translated into French using the GNMT engine (with a time step of 1 month). The produced translations were recorded and annotated in an especially designed supracorpora database, allowing to register a series of 12 translations for each of the 250 Russian sentences. To annotate the translations, there was a need to elaborate an error typology that would help reveal if the NMT system improved its output quality or not. One year-long experiment shows that not only does NMT quality improve, but it also may decrease with time.https://doi.org/10.1007/s44163-025-00222-yNeural machine translationTemporal evaluationTemporal (in)stability of neural machine translationIndicator-based evaluationLinguistic annotationError typology
spellingShingle Anna Egorova
Mikhail Kruzhkov
Vitaly Nuriev
Igor Zatsman
Interval evaluation of temporal (in)stability for neural machine translation
Discover Artificial Intelligence
Neural machine translation
Temporal evaluation
Temporal (in)stability of neural machine translation
Indicator-based evaluation
Linguistic annotation
Error typology
title Interval evaluation of temporal (in)stability for neural machine translation
title_full Interval evaluation of temporal (in)stability for neural machine translation
title_fullStr Interval evaluation of temporal (in)stability for neural machine translation
title_full_unstemmed Interval evaluation of temporal (in)stability for neural machine translation
title_short Interval evaluation of temporal (in)stability for neural machine translation
title_sort interval evaluation of temporal in stability for neural machine translation
topic Neural machine translation
Temporal evaluation
Temporal (in)stability of neural machine translation
Indicator-based evaluation
Linguistic annotation
Error typology
url https://doi.org/10.1007/s44163-025-00222-y
work_keys_str_mv AT annaegorova intervalevaluationoftemporalinstabilityforneuralmachinetranslation
AT mikhailkruzhkov intervalevaluationoftemporalinstabilityforneuralmachinetranslation
AT vitalynuriev intervalevaluationoftemporalinstabilityforneuralmachinetranslation
AT igorzatsman intervalevaluationoftemporalinstabilityforneuralmachinetranslation