QUA-RC: the semi-synthetic dataset of multiple choice questions for assessing reading comprehension in Ukrainian

In this article we present the first dataset of multiple choice questions for assessing reading comprehension in Ukrainian. The dataset is based on the texts from the Ukrainian national tests for reading comprehension, and the MCQs themselves are created semi-automatically in three stages. The firs...

Full description

Saved in:
Bibliographic Details
Main Authors: Mariia Zyrianova, Dmytro Kalpakchi
Format: Article
Language:English
Published: Linköping University Electronic Press 2023-11-01
Series:Northern European Journal of Language Technology
Online Access:https://nejlt.ep.liu.se/article/view/4939
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832591228428877824
author Mariia Zyrianova
Dmytro Kalpakchi
author_facet Mariia Zyrianova
Dmytro Kalpakchi
author_sort Mariia Zyrianova
collection DOAJ
description In this article we present the first dataset of multiple choice questions for assessing reading comprehension in Ukrainian. The dataset is based on the texts from the Ukrainian national tests for reading comprehension, and the MCQs themselves are created semi-automatically in three stages. The first stage was to use GPT-3 to generate the MCQs zero-shot, the second stage was to select MCQs of sufficient quality and revise the ones with minor errors, whereas the final stage was to expand the dataset with the MCQs written manually. The dataset is created by the Ukrainian language native speakers, one of whom is also a language teacher. The resulting corpus has slightly more than 900 MCQs, of which only 43 MCQs could be kept as they were generated by GPT-3.
format Article
id doaj-art-4c2bfc1a1a9449c8a62201d28b3451ff
institution Kabale University
issn 2000-1533
language English
publishDate 2023-11-01
publisher Linköping University Electronic Press
record_format Article
series Northern European Journal of Language Technology
spelling doaj-art-4c2bfc1a1a9449c8a62201d28b3451ff2025-01-22T15:25:14ZengLinköping University Electronic PressNorthern European Journal of Language Technology2000-15332023-11-019110.3384/nejlt.2000-1533.2023.4939QUA-RC: the semi-synthetic dataset of multiple choice questions for assessing reading comprehension in UkrainianMariia Zyrianova0Dmytro Kalpakchi1KTH Royal Institute of TechnologyKTH Royal Institute of Technology In this article we present the first dataset of multiple choice questions for assessing reading comprehension in Ukrainian. The dataset is based on the texts from the Ukrainian national tests for reading comprehension, and the MCQs themselves are created semi-automatically in three stages. The first stage was to use GPT-3 to generate the MCQs zero-shot, the second stage was to select MCQs of sufficient quality and revise the ones with minor errors, whereas the final stage was to expand the dataset with the MCQs written manually. The dataset is created by the Ukrainian language native speakers, one of whom is also a language teacher. The resulting corpus has slightly more than 900 MCQs, of which only 43 MCQs could be kept as they were generated by GPT-3. https://nejlt.ep.liu.se/article/view/4939
spellingShingle Mariia Zyrianova
Dmytro Kalpakchi
QUA-RC: the semi-synthetic dataset of multiple choice questions for assessing reading comprehension in Ukrainian
Northern European Journal of Language Technology
title QUA-RC: the semi-synthetic dataset of multiple choice questions for assessing reading comprehension in Ukrainian
title_full QUA-RC: the semi-synthetic dataset of multiple choice questions for assessing reading comprehension in Ukrainian
title_fullStr QUA-RC: the semi-synthetic dataset of multiple choice questions for assessing reading comprehension in Ukrainian
title_full_unstemmed QUA-RC: the semi-synthetic dataset of multiple choice questions for assessing reading comprehension in Ukrainian
title_short QUA-RC: the semi-synthetic dataset of multiple choice questions for assessing reading comprehension in Ukrainian
title_sort qua rc the semi synthetic dataset of multiple choice questions for assessing reading comprehension in ukrainian
url https://nejlt.ep.liu.se/article/view/4939
work_keys_str_mv AT mariiazyrianova quarcthesemisyntheticdatasetofmultiplechoicequestionsforassessingreadingcomprehensioninukrainian
AT dmytrokalpakchi quarcthesemisyntheticdatasetofmultiplechoicequestionsforassessingreadingcomprehensioninukrainian