In intermediate programming courses, students often work on large, multi-week programming projects with complex, multi-page specifications. Complex specifications provide opportunities for students to misunderstand requirements, leading to programs that pass student tests but fail reference tests, even if student tests have high coverage metrics or other markers of potential success. In such a situation, it is not possible for a student to find a ``bug'' in their program since it is internally consistent with their understanding of the requirements. As a consequence, students often attend office hours or use other help-seeking course resources to understand why their code fails hidden reference tests. This can become a major burden on teaching staff, and can lead students to ask for details about the reference tests that is not appropriate for staff to provide.
To help students better check their understanding of project specifications, we augmented our automated feedback system. The new feedback notifies students when their test cases fail against the reference implementation. This is important because it indicates that student tests assert a claim that does not align with the requirements as implied by the reference implementation.
We analyzed how students' use of this system impacted their performance on project grades. The correctness of the students' final submissions was measured on three projects across four semesters: two pre-intervention (Spring 2024, Fall 2024) and two post-intervention (Spring 2025, Fall 2025). Pairwise Mann-Whitney U tests indicated that submissions on projects from semesters using the test validation system generally achieved higher reference test pass rates than those from semesters without the system.
Several comparisons showed statistically significant differences (U ~ 49,600-104,400, p < 0.001, N > 350). Effect sizes ranged from 0.16 to 0.40, indicating a small to moderate positive effect between the use of the validation system and improved correctness outcomes. Initial usage indicated that adoption was low as an optional mechanism, but subsequent projects required using the tool to ensure more consistent engagement. This paper presents the challenges and successes of deploying this test validation plugin for large programming projects and offers recommendations for instructors seeking to scaffold student testing practices to provide more robust test-adequacy criteria.
http://orcid.org/0000-0002-2143-2633
Virginia Polytechnic Institute and State University
[biography]
http://orcid.org/https://0000-0002-5162-9314
Virginia Polytechnic Institute and State University
[biography]
The full paper will be available to logged in and registered conference attendees once the conference starts on June 21, 2026, and to all visitors after the conference ends on June 24, 2026