2026 ASEE Annual Conference & Exposition

Your Test Cases Are Lying to You: Validating Student Software Tests Against Reference Implementations to Reinforce Specification Alignment

Presented at Computers in Education (CoED): AI in Education (3 of 9) -- M508B

In intermediate programming courses, students often work on large, multi-week programming projects with complex, multi-page specifications. Complex specifications provide opportunities for students to misunderstand requirements, leading to programs that pass student tests but fail reference tests, even if student tests have high coverage metrics or other markers of potential success. In such a situation, it is not possible for a student to find a ``bug'' in their program since it is internally consistent with their understanding of the requirements. As a consequence, students often attend office hours or use other help-seeking course resources to understand why their code fails hidden reference tests. This can become a major burden on teaching staff, and can lead students to ask for details about the reference tests that is not appropriate for staff to provide.

To help students better check their understanding of project specifications, we augmented our automated feedback system. The new feedback notifies students when their test cases fail against the reference implementation. This is important because it indicates that student tests assert a claim that does not align with the requirements as implied by the reference implementation.

We analyzed how students' use of this system impacted their performance on project grades. The correctness of the students' final submissions was measured on three projects across four semesters: two pre-intervention (Spring 2024, Fall 2024) and two post-intervention (Spring 2025, Fall 2025). Pairwise Mann-Whitney U tests indicated that submissions on projects from semesters using the test validation system generally achieved higher reference test pass rates than those from semesters without the system.
Several comparisons showed statistically significant differences (U ~ 49,600-104,400, p < 0.001, N > 350). Effect sizes ranged from 0.16 to 0.40, indicating a small to moderate positive effect between the use of the validation system and improved correctness outcomes. Initial usage indicated that adoption was low as an optional mechanism, but subsequent projects required using the tool to ensure more consistent engagement. This paper presents the challenges and successes of deploying this test validation plugin for large programming projects and offers recommendations for instructors seeking to scaffold student testing practices to provide more robust test-adequacy criteria.

Authors

Alexander Hicks http://orcid.org/0000-0002-2143-2633 Virginia Polytechnic Institute and State University [biography]

Alex Hicks (he/him) is a PhD Candidate in the Department of Computer Science at Virginia Tech. He earned a Bachelor of Science in Computer Science and History from the University of Virginia in 2020. His research interests include help-seeking behavior, broadening participation in computer science, and automated feedback systems.
Prof. Stephen H Edwards http://orcid.org/https://0000-0002-5162-9314 Virginia Polytechnic Institute and State University [biography]

Stephen H. Edwards is a Professor and the Associate Department Head for Undergraduate Studies in the Department of Computer Science at Virginia Tech, where he has been teaching since 1996. He received his B.S. in electrical engineering from Caltech, and M.S. and Ph.D. in computer and information science from The Ohio State University. His research interests include computer science education, software testing, software engineering, and programming languages. He is the project lead for Web-CAT, the most widely used open-source automated grading system in the world. Web-CAT is known for allowing instructors to grade students based on how well they test their own code. In addition, his research group has produced a number of other open-source tools used in classrooms at many other institutions. Currently, he is researching innovative methods for giving feedback to students as they work on assignments to provide a more welcoming experience for students, recognizing the effort they put in and the accomplishments they make as they work on solutions. The goals of his research are to strengthen growth mindset beliefs while encouraging deliberate practice, self-checking, and skill improvement as students work.
Prof. Cliff Shaffer Virginia Polytechnic Institute and State University [biography]

Dr. Shaffer received his PhD in Computer Science from University of Maryland, College Park in 1986. He is currently Professor of Computer Science at Virginia Tech, where he has been since 1987. He directs the AlgoViz and OpenDSA projects, whose goals resp

Note

The full paper will be available to logged in and registered conference attendees once the conference starts on June 21, 2026, and to all visitors after the conference ends on June 24, 2026

« View session

For those interested in:

computer science
engineering
Faculty