2026 ASEE Annual Conference & Exposition

WIP: Low Effort, High Grades? Benchmarking LLMs on Various Engineering Assignments

Presented at Computers in Education (CoED): Poster Session - Division Special Events (1 of 4) -- M208

Large language models (LLMs) are advancing rapidly. Recent systems support explicit reasoning through chain-of-thought prompting, multimodal understanding of files and images, programming code generation, and document production (e.g., slide decks, spreadsheets, and PDFs). Instructors are increasingly concerned that students can use LLMs to complete a wide range of assignments with minimal effort, yet still earn high grades, potentially diminishing meaningful learning. On the other hand, systematic evidence remains limited on whether currently widely used LLMs can complete authentic coursework end-to-end (from prompt to final solution without human intervention).

In this paper, we evaluate six state-of-the-art LLMs commonly used by students on 24 authentic assignments drawn from a variety of engineering courses at a public research university, including introductory programming, numerical methods, and computational algorithms. Courses range from introductory to upper-division. To assess the effects of prompt input modality, we compare text-based versus image-based prompts. To capture assignment type diversity, we organize tasks into multiple-choice questions (MCQ), programming code-writing tasks, and constructed-response questions, which we further divide into long-form algorithm design questions and short-form conceptual analysis questions. MCQ and code-writing submissions are both auto-graded through an online assessment platform used by the university. Algorithm-design responses are graded by two engineering graduate students with advanced training in algorithms using a predefined rubric; any discrepancies are reviewed together and resolved. Conceptual-analysis responses are added to each course’s regular grading pool, mixed with student submissions, and evaluated by the course teaching assistant without disclosure of authorship to minimize bias.

We observe near-ceiling performance on multiple-choice, code-writing, and long-form algorithm design questions across all six models. Performance on conceptual analysis questions is more variable, though generally high. Text-based and image-based prompts perform similarly overall. These preliminary findings provide insight into the capabilities and limitations of current LLMs when applied to authentic engineering coursework.

Authors
  1. Mr. Yuxuan Chen Orcid 16x16http://orcid.org/0009-0009-2159-8746 University of Illinois Urbana-Champaign [biography]
  2. Siegfried Eggl University of Illinois Urbana-Champaign [biography]
  3. Dr. Abdussalam Alawini University of Illinois Urbana-Champaign [biography]
  4. Prof. Mariana Silva University of Illinois Urbana-Champaign [biography]
  5. Max Fowler Orcid 16x16http://orcid.org/https://0000-0002-4730-447X University of Illinois Urbana-Champaign [biography]
  6. Abhishek Umrawal Orcid 16x16http://orcid.org/0000-0003-4460-7499 University of Illinois Urbana-Champaign [biography]
  7. Melkior Ornik University of Illinois Urbana-Champaign [biography]
Note

The full paper will be available to logged in and registered conference attendees once the conference starts on June 21, 2026, and to all visitors after the conference ends on June 24, 2026

« View session

For those interested in:

  • computer science
  • engineering technology
  • Faculty