Engineering education is rapidly integrating generative artificial-intelligence (GenAI) tools that promise faster, more consistent assessment—yet their reliability in discipline-specific contexts remains uncertain. This mixed-methods study compared ChatGPT-4, Claude 3.5, and Perplexity AI across four undergraduate engineering assignments (two lower-level, two upper-level). Quantitative analyses - one-way ANOVA followed by Tukey’s HSD (α = .05) contrasted AI scores with expert grades, while qualitative feedback from faculty and students captured perceptions of clarity, fairness, and workload. ChatGPT-4 mirrored expert grades on complex tasks (|Δ| ≤ 3.5 %), whereas Claude 3.5 and Perplexity AI under-scored upper-level work by as much as 27 %. Stakeholders appreciated the rubric’s consistency and faster turnaround but criticized the models’ rigidity and opaque rationales. These findings support a hybrid approach in which AI tools provide baseline scores and instructors supply higher-order judgement. Further research should examine discipline-specific fine-tuning and the long-term impact of AI-assisted grading on student learning and educator workload.