Automatic Short-Answer Grading in Sustainability Education: AI-Human Agreement
Loading...

Date
2026
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Wiley
Open Access Color
Green Open Access
No
OpenAIRE Downloads
OpenAIRE Views
Publicly Funded
No
Abstract
Background Sustainability education emphasises critical thinking and interdisciplinary understanding, making the assessment of students' learning outcomes complex. While Large Language Models (LLMs) have shown promise in educational assessment, their reliability in domains requiring contextual reasoning-such as sustainability-remains unclear. Objectives This study aims to evaluate the agreement between human raters and several LLMs (GPT-4o, Gemini 2.0 Flash, DeepSeek V3, LLaMA 3.3) in assessing short-answer responses from a university-level Sustainability course. It also investigates how this agreement varies across cognitive skill levels. Methods A total of 232 short-answer responses were evaluated using a rubric aligned with Bloom's Revised Taxonomy. Consensus scores from human raters were compared to LLM-generated scores using multiple statistical measures, including Quadratic Weighted Kappa (QWK), Intraclass Correlation Coefficient (ICC), Pearson correlation, and distributional overlap. Results Moderate agreement was found between LLMs and human raters in total scores (QWK: 0.585-0.640; r: 0.660-0.668; eta: 0.681-0.803). Inter-rater reliability among humans was good to excellent (ICC: 0.667-0.800). Criterion-level agreement declined as cognitive complexity increased, with notably low agreement on evaluating higher-order skills. Conclusions Overall, LLM-human agreement was moderate on total scores but declined at higher cognitive levels, indicating that LLMs are suitable for basic comprehension checks while human oversight remains necessary for complex reasoning.
Description
Keywords
Rubric-Based Evaluation, Sustainability Education, Automated Assessment, Large Language Model, Scoring Agreement, Educational AI
Fields of Science
Citation
WoS Q
Scopus Q

OpenCitations Citation Count
N/A
Source
Journal of Computer Assisted Learning
Volume
42
Issue
1
Start Page
End Page
PlumX Metrics
Citations
Scopus : 0
Captures
Mendeley Readers : 28
Google Scholar™


