Automatic Short-Answer Grading in Sustainability Education: AI-Human Agreement

dc.contributor.author Emirtekin, Emrah
dc.contributor.author Ozarslan, Yasin
dc.date.accessioned 2026-04-07T11:41:20Z
dc.date.available 2026-04-07T11:41:20Z
dc.date.issued 2026
dc.description.abstract Background Sustainability education emphasises critical thinking and interdisciplinary understanding, making the assessment of students' learning outcomes complex. While Large Language Models (LLMs) have shown promise in educational assessment, their reliability in domains requiring contextual reasoning-such as sustainability-remains unclear. Objectives This study aims to evaluate the agreement between human raters and several LLMs (GPT-4o, Gemini 2.0 Flash, DeepSeek V3, LLaMA 3.3) in assessing short-answer responses from a university-level Sustainability course. It also investigates how this agreement varies across cognitive skill levels. Methods A total of 232 short-answer responses were evaluated using a rubric aligned with Bloom's Revised Taxonomy. Consensus scores from human raters were compared to LLM-generated scores using multiple statistical measures, including Quadratic Weighted Kappa (QWK), Intraclass Correlation Coefficient (ICC), Pearson correlation, and distributional overlap. Results Moderate agreement was found between LLMs and human raters in total scores (QWK: 0.585-0.640; r: 0.660-0.668; eta: 0.681-0.803). Inter-rater reliability among humans was good to excellent (ICC: 0.667-0.800). Criterion-level agreement declined as cognitive complexity increased, with notably low agreement on evaluating higher-order skills. Conclusions Overall, LLM-human agreement was moderate on total scores but declined at higher cognitive levels, indicating that LLMs are suitable for basic comprehension checks while human oversight remains necessary for complex reasoning.
dc.identifier.doi 10.1002/jcal.70160
dc.identifier.issn 1365-2729
dc.identifier.issn 0266-4909
dc.identifier.scopus 2-s2.0-105023534229
dc.identifier.uri https://hdl.handle.net/123456789/13853
dc.identifier.uri https://doi.org/10.1002/jcal.70160
dc.language.iso en
dc.publisher Wiley
dc.relation.ispartof Journal of Computer Assisted Learning
dc.rights info:eu-repo/semantics/closedAccess
dc.subject Rubric-Based Evaluation
dc.subject Sustainability Education
dc.subject Automated Assessment
dc.subject Large Language Model
dc.subject Scoring Agreement
dc.subject Educational AI
dc.title Automatic Short-Answer Grading in Sustainability Education: AI-Human Agreement
dc.type Article
dspace.entity.type Publication
gdc.author.scopusid 57202913746
gdc.author.scopusid 37161863700
gdc.author.wosid OZARSLAN, Yasin/ABI-4442-2020
gdc.author.wosid Emirtekin, Emrah/O-1205-2018
gdc.bip.impulseclass C5
gdc.bip.influenceclass C5
gdc.bip.popularityclass C5
gdc.collaboration.industrial false
gdc.description.department
gdc.description.departmenttemp [Emirtekin, Emrah] Ege Univ, Ctr Distance Educ Applicat & Res, Izmir, Turkiye; [Ozarslan, Yasin] Yasar Univ, Dept Sci Culture, Izmir, Turkiye
gdc.description.issue 1
gdc.description.publicationcategory Makale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı
gdc.description.volume 42
gdc.description.woscitationindex Social Science Citation Index
gdc.identifier.openalex W4416900547
gdc.identifier.wos WOS:001677104900008
gdc.index.type WoS
gdc.index.type Scopus
gdc.oaire.diamondjournal false
gdc.oaire.impulse 0.0
gdc.oaire.influence 2.3811355E-9
gdc.oaire.isgreen false
gdc.oaire.popularity 2.5970819E-9
gdc.oaire.publicfunded false
gdc.openalex.collaboration National
gdc.openalex.fwci 9.9024
gdc.openalex.normalizedpercentile 0.98
gdc.openalex.toppercent TOP 10%
gdc.opencitations.count 0
gdc.plumx.mendeley 28
gdc.plumx.newscount 1
gdc.plumx.scopuscites 0
gdc.scopus.citedcount 0
gdc.virtual.author Emirtekin, Emrah
gdc.virtual.author Özarslan, Yasin
gdc.wos.citedcount 0
relation.isAuthorOfPublication 0a1ac541-792f-40f4-b214-9861860ab9aa
relation.isAuthorOfPublication 6ffa6205-1d67-460b-b3b0-c6f8597f53e3
relation.isAuthorOfPublication.latestForDiscovery 0a1ac541-792f-40f4-b214-9861860ab9aa
relation.isOrgUnitOfPublication ac5ddece-c76d-476d-ab30-e4d3029dee37
relation.isOrgUnitOfPublication.latestForDiscovery ac5ddece-c76d-476d-ab30-e4d3029dee37

Files