Manual grading of academic assignments, even when carried out by a single instructor, can be affected by subjective factors such as fatigue, fluctuating interpretations of criteria, and variations in mood, leading to inconsistent grades and limiting the quality of feedback for students. This study addressed these challenges through a semester-long training and continuous validation of multiple standard artificial intelligence (AI) models, including ChatGPT, Gemini, and Claude, aligned with disciplinary rubrics to automatically, objectively, and efficiently review and grade assessments.
A quasi-experimental study was conducted with 120 students from the information systems course in an industrial engineering program, distributed into two groups during the first semester of 2025. Both groups were taught by the same instructor, which enabled the isolation of intra-rater subjectivity and the control of pedagogical variables. The control group (n = 60) was assessed using traditional methods, while the experimental group (n = 60) used an AI system for assessment and feedback. A mixed-methods approach was employed, combining comparative statistical analysis, measurement of grade variability, semi-structured interviews, and focus groups.
The results showed a significant reduction in grade variability in the AI-assessed course compared to the traditional method (standard deviation: 6.8 vs. 12.4 points, F = 18.7, p < 0.001). The correlation between initial and final grades was significantly higher in the AI group (r = 0.89 vs. r = 0.62, p < 0.001), indicating greater evaluative consistency. The average time to deliver feedback was reduced from 72 to 24 hours. In addition, AI achieved 99.5% evaluative accuracy, allowing the instructor to limit reviews to 1 out of every 100 assessments. Students assessed with AI reported higher satisfaction with feedback consistency (4.2 vs. 3.1 on a 5-point Likert scale, p < 0.01), though they valued personalized qualitative feedback less.
The implementation of artificial intelligence proved effective in reducing the subjectivity inherent in human assessment, even when performed by a single instructor, and in improving operational efficiency. However, opportunities were identified to personalize qualitative feedback and to integrate personalized qualitative feedback to optimize the educational experience. The findings suggest that a hybrid model that combines AI for objective assessment and human intervention for formative feedback could maximize the benefits of both approaches.
http://orcid.org/https://0000-0002-7248-4492
Universidad Andres Bello, Viña del Mar, Chile
[biography]
The full paper will be available to logged in and registered conference attendees once the conference starts on June 21, 2026, and to all visitors after the conference ends on June 24, 2026