Evaluating large language models for criterion-based grading from agreement to consistency
Abstract This study evaluates the ability of large language models (LLMs) to deliver criterion-based grading and examines the impact of prompt engineering with detailed criteria on grading. Using well-established human benchmarks and quantitative analyses, we found that even free LLMs achieve criter...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Nature Portfolio
2024-12-01
|
Series: | npj Science of Learning |
Online Access: | https://doi.org/10.1038/s41539-024-00291-1 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|