This Full paper describes the use and validation of feedback provided by an AI tool to support students’ technical writing abilities. An AI tool that uses Generative AI is being developed to generate formative feedback on students initial writing samples. The project is part of a larger study to address the challenges of providing students with rich informative feedback to improve the quality of their writing artifacts before submitting their final draft for review by the instructional team. Formative feedback is an ongoing assessment process aimed at improving students’ understanding of subject matter. It enables students to identify their strengths and weaknesses throughout their learning journey and assists instructors in evaluating the effectiveness of their teaching methods in achieving learning objectives. However, providing feedback in large classrooms can pose significant challenges for instructors, particularly with complex assignments such as essay writing, report writing and proposal writing. Even with the support of an instructional team, this process can be time-consuming and increase workload.
To address these challenges, educational researchers and technologists have explored the use of technology for automated writing evaluations (AWE). This exploration has led to the emergence of tools like ChatGPT, which has gained popularity in recent years for generating formative feedback based on user-defined prompts. Numerous studies have examined ChatGPT’s ability to assess student essays across various educational levels, often comparing its feedback quality to that of human evaluators.
In contrast, our study investigates Charlie, an AI-powered teaching tool specifically designed to provide real-time feedback based on criteria established in instructor rubrics. This paper aims to reports the methods for evaluating the effectiveness of formative feedback generated by Charlie compared to that from human evaluators in guiding improvements in student writing. The initial study analyzed feedback samples from five essays in which first-year engineering students articulated their interests in specific engineering majors and developed actionable plans to achieve their career goals.
Charlie allows students to resubmit their essays multiple times; therefore, we selected samples based on the number of iterations and the time intervals of each submission. In total, we analyzed 19 feedback samples from the five essays, with repetitions ranging from two to five. Then, these samples were compared to human feedback using interrater reliability measures to assess consistency and agreement across feedback types. To minimize bias, human evaluators first assessed the essays without access to Charlie’s feedback.
Preliminary results provide valuable insights into the comparative quality of feedback from Charlie and human evaluators, highlighting both strengths and potential gaps in Charlie's performance. We anticipate that our study will contribute to the growing body of literature on generative AI in education, particularly in providing scalable, timely, and relevant formative feedback for writing assessments.
The full paper will be available to logged in and registered conference attendees once the conference starts on June 22, 2025, and to all visitors after the conference ends on June 25, 2025