Interval evaluation of temporal (in)stability for neural machine translation

Abstract Though neural machine translation (NMT) has become the leading machine translation (MT) paradigm, its output may still contain errors. To improve NMT quality, it is important to investigate these errors and to see how NMT quality changes with time. The primary focus of the paper is on what...

Full description

Saved in:

Bibliographic Details
Main Authors:	Anna Egorova, Mikhail Kruzhkov, Vitaly Nuriev, Igor Zatsman
Format:	Article
Language:	English
Published:	Springer 2025-01-01
Series:	Discover Artificial Intelligence
Subjects:	Neural machine translation Temporal evaluation Temporal (in)stability of neural machine translation Indicator-based evaluation Linguistic annotation Error typology
Online Access:	https://doi.org/10.1007/s44163-025-00222-y
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832585549205995520
author	Anna Egorova Mikhail Kruzhkov Vitaly Nuriev Igor Zatsman
author_facet	Anna Egorova Mikhail Kruzhkov Vitaly Nuriev Igor Zatsman
author_sort	Anna Egorova
collection	DOAJ
description	Abstract Though neural machine translation (NMT) has become the leading machine translation (MT) paradigm, its output may still contain errors. To improve NMT quality, it is important to investigate these errors and to see how NMT quality changes with time. The primary focus of the paper is on what is referred to here as “temporal (in)stability of NMT”, the phenomenon that was uncovered in a year-long experiment and may be researched applying interval evaluation methods. The paper presents data collected while observing how far, if at all, the Google’s Neural Machine Translation (GNMT) system progressed during a year. The data were qualitatively evaluated based on a set of indicators. To that end, 250 Russian text sentences were chosen. In the course of a year, each sentence was repeatedly translated into French using the GNMT engine (with a time step of 1 month). The produced translations were recorded and annotated in an especially designed supracorpora database, allowing to register a series of 12 translations for each of the 250 Russian sentences. To annotate the translations, there was a need to elaborate an error typology that would help reveal if the NMT system improved its output quality or not. One year-long experiment shows that not only does NMT quality improve, but it also may decrease with time.
format	Article
id	doaj-art-1730881d007f47789051af3299af8e6e
institution	Kabale University
issn	2731-0809
language	English
publishDate	2025-01-01
publisher	Springer
record_format	Article
series	Discover Artificial Intelligence
spelling	doaj-art-1730881d007f47789051af3299af8e6e2025-01-26T12:43:00ZengSpringerDiscover Artificial Intelligence2731-08092025-01-015111710.1007/s44163-025-00222-yInterval evaluation of temporal (in)stability for neural machine translationAnna Egorova0Mikhail Kruzhkov1Vitaly Nuriev2Igor Zatsman3Institute of Informatics Problems, Federal Research Center Computer Science and Control of the Russian Academy of Sciences (FRC CSC RAS)Independent researcher Center for Emerging Practices, Institute of Scientific Information for Social Sciences of the Russian Academy of Sciences (INION RAN)Institute of Informatics Problems, Federal Research Center Computer Science and Control of the Russian Academy of Sciences (FRC CSC RAS)Abstract Though neural machine translation (NMT) has become the leading machine translation (MT) paradigm, its output may still contain errors. To improve NMT quality, it is important to investigate these errors and to see how NMT quality changes with time. The primary focus of the paper is on what is referred to here as “temporal (in)stability of NMT”, the phenomenon that was uncovered in a year-long experiment and may be researched applying interval evaluation methods. The paper presents data collected while observing how far, if at all, the Google’s Neural Machine Translation (GNMT) system progressed during a year. The data were qualitatively evaluated based on a set of indicators. To that end, 250 Russian text sentences were chosen. In the course of a year, each sentence was repeatedly translated into French using the GNMT engine (with a time step of 1 month). The produced translations were recorded and annotated in an especially designed supracorpora database, allowing to register a series of 12 translations for each of the 250 Russian sentences. To annotate the translations, there was a need to elaborate an error typology that would help reveal if the NMT system improved its output quality or not. One year-long experiment shows that not only does NMT quality improve, but it also may decrease with time.https://doi.org/10.1007/s44163-025-00222-yNeural machine translationTemporal evaluationTemporal (in)stability of neural machine translationIndicator-based evaluationLinguistic annotationError typology
spellingShingle	Anna Egorova Mikhail Kruzhkov Vitaly Nuriev Igor Zatsman Interval evaluation of temporal (in)stability for neural machine translation Discover Artificial Intelligence Neural machine translation Temporal evaluation Temporal (in)stability of neural machine translation Indicator-based evaluation Linguistic annotation Error typology
title	Interval evaluation of temporal (in)stability for neural machine translation
title_full	Interval evaluation of temporal (in)stability for neural machine translation
title_fullStr	Interval evaluation of temporal (in)stability for neural machine translation
title_full_unstemmed	Interval evaluation of temporal (in)stability for neural machine translation
title_short	Interval evaluation of temporal (in)stability for neural machine translation
title_sort	interval evaluation of temporal in stability for neural machine translation
topic	Neural machine translation Temporal evaluation Temporal (in)stability of neural machine translation Indicator-based evaluation Linguistic annotation Error typology
url	https://doi.org/10.1007/s44163-025-00222-y
work_keys_str_mv	AT annaegorova intervalevaluationoftemporalinstabilityforneuralmachinetranslation AT mikhailkruzhkov intervalevaluationoftemporalinstabilityforneuralmachinetranslation AT vitalynuriev intervalevaluationoftemporalinstabilityforneuralmachinetranslation AT igorzatsman intervalevaluationoftemporalinstabilityforneuralmachinetranslation

Interval evaluation of temporal (in)stability for neural machine translation

Similar Items