2026 ASEE Annual Conference & Exposition

A Cross-Disciplinary Study Evaluating the Effectiveness of GenAI created Multiple Choice Questions

Presented at Computers in Education (CoED): Learning, Engagement & Inclusion (2 of 9) -- M408B

As GenAI tools become increasingly available in education, instructors are actively exploring how to leverage AI to manage growing workloads, particularly in assessment design, where creating and maintaining question banks for homework, quizzes, and exams requires considerable time and effort. However, the effectiveness of AI-generated assessments remains an open question, with instructors uncertain about question quality and reliability of the AI system.

Multiple choice questions (MCQs) represent a particularly compelling use case for AI automation given their widespread use, objectivity, and need for large question banks. However, the use of poorly engineered prompts can negate the advantages of using AI by generating questions that are flawed or irrelevant, requiring more effort to evaluate and often need extensive edits. Systematic frameworks for evaluating AI-generated MCQ effectiveness and empirical evidence of GenAI's adherence to quality guidelines remain limited.

We introduce an AI-powered MCQ generation tool within an online interactive textbook platform to generate assessments. The tool uses GPT-4.1 with structured prompts that incorporate explicit guardrails and learning science principles to generate high quality MCQs from instructor-selected content.

To evaluate whether GenAI reliably adheres to these prompt-based guardrails, we present a multi-dimensional framework and apply it to over 500 questions generated for introductory courses in Computer Science, Mathematics, and Data Science. Our framework evaluates the efficacy of MCQs based on quality and relevance, each of which are defined by several quantifiable metrics. Additionally, we classify MCQs using Bloom's taxonomy to identify patterns in cognitive complexity of AI generated questions. We also investigate the correlation between Bloom's taxonomy and the efficacy of the MCQ to expose how AI handles question generation at different Bloom's levels.

Overall, our findings reveal the extent to which GenAI adheres to prompt-based guardrails and generates effective MCQs. We present both strengths and systematic failure patterns that inform best practices for GenAI-assisted assessment design.

Authors

Erica Perich zyBooks, A Wiley Brand [biography]

Erica Perich is Authoring and Pedagogy lead at zyBooks. She works on incorporating learning science into pedagogical practices for authoring across disciplines. She has also extensively contributed as an author to a leading online IT digital textbook. She earned a BS in Mathematics Education from Brigham Young University in 2013 and an MS in Information Technology from Southern New Hampshire University in 2023. While teaching secondary mathematics, she gained a passion for finding innovative ways to facilitate students’ conceptual understanding.
Dr. Yamuna Rajasekhar zyBooks, A Wiley Brand [biography]

Yamuna Rajasekhar is a senior manager of Content at zyBooks, a Wiley Brand. She is an author and contributor to various zyBooks titles. She was formerly an assistant professor of Electrical and Computer Engineering at Miami University. She received her M.S. and Ph.D. in Electrical and Computer Engineering from UNC Charlotte.

Note

The full paper will be available to logged in and registered conference attendees once the conference starts on June 21, 2026, and to all visitors after the conference ends on June 24, 2026

« View session