Can large language models meet the challenge of generating school-level questions?
In the realm of education, crafting appropriate questions for examinations is a meticulous and time-consuming task that is crucial for assessing students' understanding of the subject matter. This paper explores the potential of leveraging large language models (LLMs) to automate question gener...
Saved in:
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Elsevier
2025-06-01
|
Series: | Computers and Education: Artificial Intelligence |
Subjects: | |
Online Access: | http://www.sciencedirect.com/science/article/pii/S2666920X25000104 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832590864506945536 |
---|---|
author | Subhankar Maity Aniket Deroy Sudeshna Sarkar |
author_facet | Subhankar Maity Aniket Deroy Sudeshna Sarkar |
author_sort | Subhankar Maity |
collection | DOAJ |
description | In the realm of education, crafting appropriate questions for examinations is a meticulous and time-consuming task that is crucial for assessing students' understanding of the subject matter. This paper explores the potential of leveraging large language models (LLMs) to automate question generation in the educational domain. Specifically, we focus on generating educational questions from contexts extracted from school-level textbooks. Our study aims to prompt LLMs such as GPT-4 Turbo, GPT-3.5 Turbo, Llama-2-70B, Llama-3.1-405B, and Gemini Pro to generate a complete set of questions for each context, potentially streamlining the question generation process for educators. We performed a human evaluation of the generated questions, assessing their coverage, grammaticality, usefulness, answerability, and relevance. Additionally, we prompted LLMs to generate questions based on Bloom's revised taxonomy, categorizing and evaluating these questions according to their cognitive complexity and learning objectives. We applied both zero-shot and eight-shot prompting techniques. These efforts provide insight into the efficacy of LLMs in automated question generation and their potential in assessing students' cognitive abilities across various school-level subjects. The results show that employing an eight-shot technique improves the performance of human evaluation metrics for the generated complete set of questions and helps generate questions that are better aligned with Bloom's revised taxonomy. |
format | Article |
id | doaj-art-110b4e72594b46e5a000dba794efb765 |
institution | Kabale University |
issn | 2666-920X |
language | English |
publishDate | 2025-06-01 |
publisher | Elsevier |
record_format | Article |
series | Computers and Education: Artificial Intelligence |
spelling | doaj-art-110b4e72594b46e5a000dba794efb7652025-01-23T05:27:52ZengElsevierComputers and Education: Artificial Intelligence2666-920X2025-06-018100370Can large language models meet the challenge of generating school-level questions?Subhankar Maity0Aniket Deroy1Sudeshna Sarkar2Corresponding author.; Indian Institute of Technology Kharagpur, Kharagpur, West Bengal, 721302, IndiaIndian Institute of Technology Kharagpur, Kharagpur, West Bengal, 721302, IndiaIndian Institute of Technology Kharagpur, Kharagpur, West Bengal, 721302, IndiaIn the realm of education, crafting appropriate questions for examinations is a meticulous and time-consuming task that is crucial for assessing students' understanding of the subject matter. This paper explores the potential of leveraging large language models (LLMs) to automate question generation in the educational domain. Specifically, we focus on generating educational questions from contexts extracted from school-level textbooks. Our study aims to prompt LLMs such as GPT-4 Turbo, GPT-3.5 Turbo, Llama-2-70B, Llama-3.1-405B, and Gemini Pro to generate a complete set of questions for each context, potentially streamlining the question generation process for educators. We performed a human evaluation of the generated questions, assessing their coverage, grammaticality, usefulness, answerability, and relevance. Additionally, we prompted LLMs to generate questions based on Bloom's revised taxonomy, categorizing and evaluating these questions according to their cognitive complexity and learning objectives. We applied both zero-shot and eight-shot prompting techniques. These efforts provide insight into the efficacy of LLMs in automated question generation and their potential in assessing students' cognitive abilities across various school-level subjects. The results show that employing an eight-shot technique improves the performance of human evaluation metrics for the generated complete set of questions and helps generate questions that are better aligned with Bloom's revised taxonomy.http://www.sciencedirect.com/science/article/pii/S2666920X25000104Automated question generation (AQG)Large language models (LLMs)Bloom's revised taxonomyGPTPrompt |
spellingShingle | Subhankar Maity Aniket Deroy Sudeshna Sarkar Can large language models meet the challenge of generating school-level questions? Computers and Education: Artificial Intelligence Automated question generation (AQG) Large language models (LLMs) Bloom's revised taxonomy GPT Prompt |
title | Can large language models meet the challenge of generating school-level questions? |
title_full | Can large language models meet the challenge of generating school-level questions? |
title_fullStr | Can large language models meet the challenge of generating school-level questions? |
title_full_unstemmed | Can large language models meet the challenge of generating school-level questions? |
title_short | Can large language models meet the challenge of generating school-level questions? |
title_sort | can large language models meet the challenge of generating school level questions |
topic | Automated question generation (AQG) Large language models (LLMs) Bloom's revised taxonomy GPT Prompt |
url | http://www.sciencedirect.com/science/article/pii/S2666920X25000104 |
work_keys_str_mv | AT subhankarmaity canlargelanguagemodelsmeetthechallengeofgeneratingschoollevelquestions AT aniketderoy canlargelanguagemodelsmeetthechallengeofgeneratingschoollevelquestions AT sudeshnasarkar canlargelanguagemodelsmeetthechallengeofgeneratingschoollevelquestions |