Can large language models meet the challenge of generating school-level questions?

In the realm of education, crafting appropriate questions for examinations is a meticulous and time-consuming task that is crucial for assessing students' understanding of the subject matter. This paper explores the potential of leveraging large language models (LLMs) to automate question gener...

Full description

Saved in:
Bibliographic Details
Main Authors: Subhankar Maity, Aniket Deroy, Sudeshna Sarkar
Format: Article
Language:English
Published: Elsevier 2025-06-01
Series:Computers and Education: Artificial Intelligence
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2666920X25000104
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832590864506945536
author Subhankar Maity
Aniket Deroy
Sudeshna Sarkar
author_facet Subhankar Maity
Aniket Deroy
Sudeshna Sarkar
author_sort Subhankar Maity
collection DOAJ
description In the realm of education, crafting appropriate questions for examinations is a meticulous and time-consuming task that is crucial for assessing students' understanding of the subject matter. This paper explores the potential of leveraging large language models (LLMs) to automate question generation in the educational domain. Specifically, we focus on generating educational questions from contexts extracted from school-level textbooks. Our study aims to prompt LLMs such as GPT-4 Turbo, GPT-3.5 Turbo, Llama-2-70B, Llama-3.1-405B, and Gemini Pro to generate a complete set of questions for each context, potentially streamlining the question generation process for educators. We performed a human evaluation of the generated questions, assessing their coverage, grammaticality, usefulness, answerability, and relevance. Additionally, we prompted LLMs to generate questions based on Bloom's revised taxonomy, categorizing and evaluating these questions according to their cognitive complexity and learning objectives. We applied both zero-shot and eight-shot prompting techniques. These efforts provide insight into the efficacy of LLMs in automated question generation and their potential in assessing students' cognitive abilities across various school-level subjects. The results show that employing an eight-shot technique improves the performance of human evaluation metrics for the generated complete set of questions and helps generate questions that are better aligned with Bloom's revised taxonomy.
format Article
id doaj-art-110b4e72594b46e5a000dba794efb765
institution Kabale University
issn 2666-920X
language English
publishDate 2025-06-01
publisher Elsevier
record_format Article
series Computers and Education: Artificial Intelligence
spelling doaj-art-110b4e72594b46e5a000dba794efb7652025-01-23T05:27:52ZengElsevierComputers and Education: Artificial Intelligence2666-920X2025-06-018100370Can large language models meet the challenge of generating school-level questions?Subhankar Maity0Aniket Deroy1Sudeshna Sarkar2Corresponding author.; Indian Institute of Technology Kharagpur, Kharagpur, West Bengal, 721302, IndiaIndian Institute of Technology Kharagpur, Kharagpur, West Bengal, 721302, IndiaIndian Institute of Technology Kharagpur, Kharagpur, West Bengal, 721302, IndiaIn the realm of education, crafting appropriate questions for examinations is a meticulous and time-consuming task that is crucial for assessing students' understanding of the subject matter. This paper explores the potential of leveraging large language models (LLMs) to automate question generation in the educational domain. Specifically, we focus on generating educational questions from contexts extracted from school-level textbooks. Our study aims to prompt LLMs such as GPT-4 Turbo, GPT-3.5 Turbo, Llama-2-70B, Llama-3.1-405B, and Gemini Pro to generate a complete set of questions for each context, potentially streamlining the question generation process for educators. We performed a human evaluation of the generated questions, assessing their coverage, grammaticality, usefulness, answerability, and relevance. Additionally, we prompted LLMs to generate questions based on Bloom's revised taxonomy, categorizing and evaluating these questions according to their cognitive complexity and learning objectives. We applied both zero-shot and eight-shot prompting techniques. These efforts provide insight into the efficacy of LLMs in automated question generation and their potential in assessing students' cognitive abilities across various school-level subjects. The results show that employing an eight-shot technique improves the performance of human evaluation metrics for the generated complete set of questions and helps generate questions that are better aligned with Bloom's revised taxonomy.http://www.sciencedirect.com/science/article/pii/S2666920X25000104Automated question generation (AQG)Large language models (LLMs)Bloom's revised taxonomyGPTPrompt
spellingShingle Subhankar Maity
Aniket Deroy
Sudeshna Sarkar
Can large language models meet the challenge of generating school-level questions?
Computers and Education: Artificial Intelligence
Automated question generation (AQG)
Large language models (LLMs)
Bloom's revised taxonomy
GPT
Prompt
title Can large language models meet the challenge of generating school-level questions?
title_full Can large language models meet the challenge of generating school-level questions?
title_fullStr Can large language models meet the challenge of generating school-level questions?
title_full_unstemmed Can large language models meet the challenge of generating school-level questions?
title_short Can large language models meet the challenge of generating school-level questions?
title_sort can large language models meet the challenge of generating school level questions
topic Automated question generation (AQG)
Large language models (LLMs)
Bloom's revised taxonomy
GPT
Prompt
url http://www.sciencedirect.com/science/article/pii/S2666920X25000104
work_keys_str_mv AT subhankarmaity canlargelanguagemodelsmeetthechallengeofgeneratingschoollevelquestions
AT aniketderoy canlargelanguagemodelsmeetthechallengeofgeneratingschoollevelquestions
AT sudeshnasarkar canlargelanguagemodelsmeetthechallengeofgeneratingschoollevelquestions