Automated feedback systems are becoming more important in programming education as class sizes grow, and instructor resources are limited. Recent advances in large language models (LLMs) offer a practical way for educators to provide structured feedback for students on various assignments. A pre-experiment involved four student researchers solving Project Euler problems and showed an average improvement of 17.5 points on a scoring rubric out of 100 after code revision using feedback generated from Claude 3.5 Sonnet. There were also notable gains in time complexity, efficiency, and edge case handling, with percentage increases 24.45%, 22.59%, and 22%, respectively. Building on these results, we designed a classroom-based experiment involving students across various programming courses. Students will be divided into control (human feedback) and treatment (LLM feedback) groups, with feedback graded with a 14-criteria rubric. Claude 3.7 Sonnet will be the LLM used in this study, as it is the latest model released by Anthropic. The study evaluates both quantitative score improvements and students’ perceptions of feedback quality. The results of this study aim to inform the integration of LLMs into education assessment practices.
Are you a researcher? Would you like to cite this paper? Visit the ASEE document repository at peer.asee.org for more tools and easy citations.