Generative AI against bias in exam grading

At the Teaching Academy conference in November, Guðrún Rútsdóttir presented a study comparing exam grading by generative AI models, experienced teachers and novice teachers. The findings indicate that generative AI can both speed up grading and help teachers reduce bias, including against students with weaker English skills.

Guðrún Rútsdóttir presented her and her collaborators’ results on the use of generative AI to reduce bias in exam grading at the Teaching Academy conference November 21st. She described the challenges involved in assessing a diverse group of students who come from many different backgrounds, both geographically and academically.

Previous work has shown a gender imbalance in exam grading by inexperienced teachers (Hofer, 2015), and she expressed concern about bias in grading based on students’ language proficiency. She therefore wanted to compare grading by generative AI and by teachers with different levels of teaching experience.

Three different generative AI models were used for the exam grading, and the correlation between the models was very high. The correlations between the models and experienced teachers were of a similar magnitude, whereas the correlation for novice teachers was somewhat lower. On closer inspection it emerged that novice teachers gave students with poor English skills relatively lower grades than those given by the experienced teachers and the models.

Looking at the mean grade assigned by each marker, the generative AI models gave much lower mean grades than both experienced and novice teachers, but because of the high correlation between the models and the experienced teachers it is possible to scale up the AI grades. The results therefore show that generative AI can be very useful in exam grading, both to speed up the process and to help teachers assess their own bias.

Generative AI against bias in exam grading