2026 ASEE Annual Conference & Exposition

Learning to Self-Evaluate: How Students Calibrate Self-Assessment in Structured Engineering Studio Environments

Presented at Engaging Classroom Environments for Significant Learning

Self-assessment practices have been widely recognized for their potential to deepen conceptual understanding, build metacognitive capacity, and enhance student self-efficacy. When paired with well-designed rubrics, these practices can help students take ownership of their learning by providing clear criteria for evaluating their own progress. Yet the success of these practices varies considerably across disciplines and instructional contexts. This study examines how biomedical engineering students develop self-assessment skills across four iterative design studios using a proficiency rubric for integrated skills measurement (PRISM), which accesses engineering design competencies across four proficiency levels. Following each studio, students individually evaluated their team’s collaborative work using the same rubric instructors used for evaluation, providing parallel assessments for systematic comparison. We analyzed paired student-instructor ratings from 64 students in a junior-level BME course to investigate: (1) alignment between student self-assessments and instructor evaluations, (2) changes in alignment from Studio 1 to Studio 4, and (3) variation among team members evaluating the same deliverable. Results showed modest overall alignment (40.6% exact agreement; Gwet's AC1 = 0.208), with 86% of ratings within ±1 proficiency level. Contrary to expectations, calibration worsened over time: mean student-instructor differences increased from 0.15 in Studio 1 to 0.80 in Studio 4, driven by students' increasing tendency to overestimate competence in constraint analysis and connecting quantitative models to qualitative representations. These findings demonstrate that familiarity with assessment criteria alone does not produce calibration; students require explicit instruction in comparing their judgments to expert standards and justifying ratings with concrete evidence. The persistent within-team variation further suggests that shared deliverables do not automatically generate shared understanding—teammates attend to different evidence and interpret rubric criteria differently, particularly for competencies requiring translation between qualitative and quantitative representations. We discuss implications for designing calibration-focused interventions, including structured opportunities for students to compare self-ratings with instructor evaluations, team-based discussions of divergent ratings, and competency-specific scaffolds that guide evidence-based justification.

Authors
  1. Stephanie Fuchs Cornell University [biography]
  2. Prof. Jonathan T. Butcher Cornell University
Note

The full paper will be available to logged in and registered conference attendees once the conference starts on June 21, 2026, and to all visitors after the conference ends on June 24, 2026