In this NSF Grantee Poster Session Paper, we report on Year 3 of a collaboration between machine learning researchers at [institution blinded for peer review] and engineering education researchers at [institution blinded for peer review] funded by NSF Research in the Formation of Engineers (RFE). We aim to develop a Generative Artificial Intelligence (GenAI) assistant for the [tool name blinded for peer review] that can automate the analysis of short-answer justifications to concept questions in engineering thermodynamics and mechanics. Concept questions, sometimes referred to as ConcepTests [1], are challenging single-right-answer multiple choice questions that ask students to apply recently learned concepts to novel scenarios. When coupled with a short-answer justification task, concept questions have been shown to promote deeper understanding, better prepare students for in-class discussion, and improve learning outcomes [2], [3]. Thus, analyzing short-answer justifications to concept questions provides instructors and researchers with a wealth of information regarding student understanding. However, this requires an intense amount of resources, motivating our interest in using large language models (LLMs) to assist with qualitative data analysis.
Various machine learning approaches have been used to assess short- and long-written student texts, automate grading, and offer personalized learning for students [4]. However, the use of Transformer-based LLMs for the analysis of student text is still emergent. We collected over 3000 short-answer justifications to concept questions in mechanics and thermodynamics courses from students across diverse institutions through the [tool name blinded for peer review] [5], [6], [7], a free, web-based active learning tool and content repository. Manual coding was done using a two-stage process [8] with a resources framing [9]. We then trained Mixtral-of-Experts [10], Llama-3 [11], GPT-4 [12], GPT-4o-mini [13], and Phi-3.5-mini [14] to automate coding. We scaffold the project with the following research questions:
1. What ideas do students use to explain their reasoning when writing short-answer responses to conceptually challenging questions?
2. How well do transformer-based machine learning models replicate the human-coded data?
We summarize our recent work, where we observe that Mixtral and Llama-3 performed the best on in-domain analyses (e.g., trained on thermodynamics and tested on thermodynamics). GPT-4 has a robust performance for both in-domain and cross-domain (e.g., trained on thermodynamics and tested on mechanics) applications [7]. We also detail our emergent co-design process with instructors using the Concept Warehouse. Co-design can help us gain insight into instructor values and practices, which further informs the development of the GenAI assistant. More broadly, we use this work to emphasize the development of educational technology that emphasizes the processes, rather than the products, of student learning.
References
[1] E. Mazur, Peer Instruction: A User’s Manual. in Series in Educational Innovation. Prentice Hall, 1997.
[2] M. D. Koretsky, B. J. Brooks, R. M. White, and A. S. Bowen, “Querying the questions: Student responses and reasoning in an active learning class,” J. Eng. Educ., vol. 105, no. 2, pp. 219–244, 2016, doi: 10.1002/jee.20116.
[3] M. D. Koretsky, B. J. Brooks, and A. Z. Higgins, “Written justifications to multiple-choice concept questions during active learning in class,” Int J. Sci. Educ., vol. 38, no. 11, pp. 1747–1765, Jul. 2016, doi: 10.1080/09500693.2016.1214303.
[4] X. Zhai, “Practices and Theories: How can machine learning assist in innovative assessment practices in science education,” J. Sci. Educ. Technol., vol. 30, no. 2, pp. 139–149, Apr. 2021, doi: 10.1007/s10956-021-09901-8.
[5] H. Auby, N. Shivagunde, A. Rumshisky, and M. D. Koretsky, “WIP: Using machine learning to automate coding of student explanations to challenging mechanics concept questions,” in Proceedings of the 2022 American Society of Engineering Education Annual Conference & Exposition, Jun. 2022. [Online]. Available: https://peer.asee.org/40507
[6] H. Auby, N. Shivagunde, A. Rumshisky, and M. Koretsky, “Using machine learning to analyze short-answer responses to conceptually challenging chemical engineering thermodynamics questions,” in Proceedings of the 2024 American Society of Engineering Education Annual Conference & Exposition, Portland, Oregon, Jun. 2024. [Online]. Available: https://peer.asee.org/48236
[7] H. Auby, N. Shivagunde, V. Deshpande, A. Rumshisky, and M. D. Koretsky, “Analysis of student understanding in short-answer explanations to concept questions using a human-centered AI approach,” J. Eng. Educ., vol. 114, no. 4, p. e70032, 2025, doi: 10.1002/jee.70032.
[8] J. Saldaña, The coding manual for qualitative researchers. SAGE Publications, 2021.
[9] D. Hammer, “Student resources for learning introductory physics,” Am. J. Phys., vol. 68, no. S1, pp. S52–S59, July 2000, doi: 10.1119/1.19520.
[10] A. Q. Jiang et al., “Mixtral of Experts,” Jan. 08, 2024, arXiv: arXiv:2401.04088. doi: 10.48550/arXiv.2401.04088.
[11] Llama Team and AI @ Meta, “The Llama 3 Herd of Models.”
[12] OpenAI et al., “GPT-4 Technical Report,” Mar. 04, 2024, arXiv: arXiv:2303.08774. doi: 10.48550/arXiv.2303.08774.
[13] OpenAI et al., “GPT-4o System Card,” Oct. 25, 2024, arXiv: arXiv:2410.21276. doi: 10.48550/arXiv.2410.21276.
[14] M. Abdin et al., “Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone,” May 23, 2024, arXiv: arXiv:2404.14219. doi: 10.48550/arXiv.2404.14219.
The full paper will be available to logged in and registered conference attendees once the conference starts on June 21, 2026, and to all visitors after the conference ends on June 24, 2026