Can large language models meet the challenge of generating school-level questions?

In the realm of education, crafting appropriate questions for examinations is a meticulous and time-consuming task that is crucial for assessing students' understanding of the subject matter. This paper explores the potential of leveraging large language models (LLMs) to automate question gener...

Full description

Saved in:

Bibliographic Details
Main Authors:	Subhankar Maity, Aniket Deroy, Sudeshna Sarkar
Format:	Article
Language:	English
Published:	Elsevier 2025-06-01
Series:	Computers and Education: Artificial Intelligence
Subjects:	Automated question generation (AQG) Large language models (LLMs) Bloom's revised taxonomy GPT Prompt
Online Access:	http://www.sciencedirect.com/science/article/pii/S2666920X25000104
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832590864506945536
author	Subhankar Maity Aniket Deroy Sudeshna Sarkar
author_facet	Subhankar Maity Aniket Deroy Sudeshna Sarkar
author_sort	Subhankar Maity
collection	DOAJ
description	In the realm of education, crafting appropriate questions for examinations is a meticulous and time-consuming task that is crucial for assessing students' understanding of the subject matter. This paper explores the potential of leveraging large language models (LLMs) to automate question generation in the educational domain. Specifically, we focus on generating educational questions from contexts extracted from school-level textbooks. Our study aims to prompt LLMs such as GPT-4 Turbo, GPT-3.5 Turbo, Llama-2-70B, Llama-3.1-405B, and Gemini Pro to generate a complete set of questions for each context, potentially streamlining the question generation process for educators. We performed a human evaluation of the generated questions, assessing their coverage, grammaticality, usefulness, answerability, and relevance. Additionally, we prompted LLMs to generate questions based on Bloom's revised taxonomy, categorizing and evaluating these questions according to their cognitive complexity and learning objectives. We applied both zero-shot and eight-shot prompting techniques. These efforts provide insight into the efficacy of LLMs in automated question generation and their potential in assessing students' cognitive abilities across various school-level subjects. The results show that employing an eight-shot technique improves the performance of human evaluation metrics for the generated complete set of questions and helps generate questions that are better aligned with Bloom's revised taxonomy.
format	Article
id	doaj-art-110b4e72594b46e5a000dba794efb765
institution	Kabale University
issn	2666-920X
language	English
publishDate	2025-06-01
publisher	Elsevier
record_format	Article
series	Computers and Education: Artificial Intelligence
spelling	doaj-art-110b4e72594b46e5a000dba794efb7652025-01-23T05:27:52ZengElsevierComputers and Education: Artificial Intelligence2666-920X2025-06-018100370Can large language models meet the challenge of generating school-level questions?Subhankar Maity0Aniket Deroy1Sudeshna Sarkar2Corresponding author.; Indian Institute of Technology Kharagpur, Kharagpur, West Bengal, 721302, IndiaIndian Institute of Technology Kharagpur, Kharagpur, West Bengal, 721302, IndiaIndian Institute of Technology Kharagpur, Kharagpur, West Bengal, 721302, IndiaIn the realm of education, crafting appropriate questions for examinations is a meticulous and time-consuming task that is crucial for assessing students' understanding of the subject matter. This paper explores the potential of leveraging large language models (LLMs) to automate question generation in the educational domain. Specifically, we focus on generating educational questions from contexts extracted from school-level textbooks. Our study aims to prompt LLMs such as GPT-4 Turbo, GPT-3.5 Turbo, Llama-2-70B, Llama-3.1-405B, and Gemini Pro to generate a complete set of questions for each context, potentially streamlining the question generation process for educators. We performed a human evaluation of the generated questions, assessing their coverage, grammaticality, usefulness, answerability, and relevance. Additionally, we prompted LLMs to generate questions based on Bloom's revised taxonomy, categorizing and evaluating these questions according to their cognitive complexity and learning objectives. We applied both zero-shot and eight-shot prompting techniques. These efforts provide insight into the efficacy of LLMs in automated question generation and their potential in assessing students' cognitive abilities across various school-level subjects. The results show that employing an eight-shot technique improves the performance of human evaluation metrics for the generated complete set of questions and helps generate questions that are better aligned with Bloom's revised taxonomy.http://www.sciencedirect.com/science/article/pii/S2666920X25000104Automated question generation (AQG)Large language models (LLMs)Bloom's revised taxonomyGPTPrompt
spellingShingle	Subhankar Maity Aniket Deroy Sudeshna Sarkar Can large language models meet the challenge of generating school-level questions? Computers and Education: Artificial Intelligence Automated question generation (AQG) Large language models (LLMs) Bloom's revised taxonomy GPT Prompt
title	Can large language models meet the challenge of generating school-level questions?
title_full	Can large language models meet the challenge of generating school-level questions?
title_fullStr	Can large language models meet the challenge of generating school-level questions?
title_full_unstemmed	Can large language models meet the challenge of generating school-level questions?
title_short	Can large language models meet the challenge of generating school-level questions?
title_sort	can large language models meet the challenge of generating school level questions
topic	Automated question generation (AQG) Large language models (LLMs) Bloom's revised taxonomy GPT Prompt
url	http://www.sciencedirect.com/science/article/pii/S2666920X25000104
work_keys_str_mv	AT subhankarmaity canlargelanguagemodelsmeetthechallengeofgeneratingschoollevelquestions AT aniketderoy canlargelanguagemodelsmeetthechallengeofgeneratingschoollevelquestions AT sudeshnasarkar canlargelanguagemodelsmeetthechallengeofgeneratingschoollevelquestions

Can large language models meet the challenge of generating school-level questions?

Similar Items