2025 ASEE Annual Conference & Exposition

RFE: Machine Learning for Student Reasoning during Challenging Concept Questions - Year 2

Presented at NSF Grantees Poster Session II

In this NSF Grantees Poster Session Paper, we describe our progress on a project funded by NSF Research in the Formation of Engineers (RFE) between engineering education researchers at [institution blinded for peer review] and machine learning researchers at [institution blinded for peer review] to use machine learning to understand student reasoning in short-answer responses written by students to challenging questions in mechanics and thermodynamics [1] - [4]. Concept questions are multiple-choice questions that require little to no math and ask students to problem-solve using recently learned concepts [5], [6]. Short-answer justifications to concept questions have been shown to improve student engagement and learning outcomes so these responses can provide a wealth of information to instructors and researchers regarding student understanding [7] - [9]. However, the large amounts of text are difficult to analyze. Researchers have utilized machine learning to automate feedback and grading, provide tutoring, and conduct additional analyses of short- and long-answer texts [10] - [19]. Recently, the application of Transformer-based large language models (LLMs) [20] to qualitative research has emerged due to their generative capabilities, prompting education and machine learning researchers to look further into their use. For this project, we have the following goals:
- For instructors: Gain information about patterns, trends, and ideas of student thinking that they could utilize in their instructional practices and pedagogical decision-making.
- For education researchers: Provide ways to analyze student understanding in various institutional contexts at a scale not feasible with manual coding.
Here, we describe our work applying state-of-the-art Transformer LLMs (including T5 [21], GPT-3 [22], GPT-4 [23], Mixtral-of-Experts [24], and ATLAS.ti Intentional coding powered by OpenAI [25]) to the task of analyzing student responses to concept questions in mechanics and chemical engineering thermodynamics. We then expand upon the work done in Year 2 to improve our language models and progress toward developing a generative AI tool to automate analysis of student responses for the [tool blinded for peer review].

References
[1] Authors, “Paper blinded for peer review,” 2022.
[2] Authors, “Paper blinded for peer review,” 2023.
[3] Authors, “Paper blinded for peer review,” 2024a.
[4] Authors, “Paper blinded for peer review,” 2024b.
[5] E. Mazur, Peer Instruction: A User’s Manual. in Series in Educational Innovation. Prentice Hall, 1997.
[6] C. H. Crouch and E. Mazur, “Peer Instruction: Ten years of experience and results,” Am. J. Phys., vol. 69, no. 9, pp. 970–977, Sep. 2001, doi: 10.1119/1.1374249.
[7] M. D. Koretsky, B. J. Brooks, R. M. White, and A. S. Bowen, “Querying the questions: Student responses and reasoning in an active learning class,” J. Eng. Educ., vol. 105, no. 2, pp. 219–244, 2016, doi: 10.1002/jee.20116.
[8] M. D. Koretsky, B. J. Brooks, and A. Z. Higgins, “Written justifications to multiple-choice concept questions during active learning in class,” Int. J. Sci. Educ., vol. 38, no. 11, pp. 1747–1765, Jul. 2016, doi: 10.1080/09500693.2016.1214303.
[9] E. Wheeler and R. L. McDonald, “Writing in engineering courses,” J. Eng. Educ., vol. 89, no. 4, pp. 481–486, 2000, doi: 10.1002/j.2168-9830.2000.tb00555.x.
[10] X. Zhai, Y. Yin, J. W. Pellegrino, K. C. Haudek, and L. Shi, “Applying machine learning in science assessment: a systematic review,” Stud. Sci. Educ., vol. 56, no. 1, pp. 111–151, Jan. 2020, doi: 10.1080/03057267.2020.1735757.
[11] X. Zhai, K. C. Haudek, L. Shi, R. H. Nehm, and M. Urban-Lurain, “From substitution to redefinition: A framework of machine learning-based science assessment,” J. Res. Sci. Teach., vol. 57, no. 9, pp. 1430–1459, 2020, doi: 10.1002/tea.21658.
[12] X. Zhai, K. C. Haudek, C. Wilson, and M. Stuhlsatz, “A framework of construct-irrelevant variance for contextualized constructed response assessment,” Front. Educ., vol. 6, 2021, Accessed: Feb. 08, 2024. [Online]. Available: https://www.frontiersin.org/articles/10.3389/feduc.2021.751283
[13] X. Zhai, L. Shi, and R. H. Nehm, “A meta-analysis of machine learning-based science assessments: Factors impacting machine-human score agreements,” J. Sci. Educ. Technol., vol. 30, no. 3, pp. 361–379, Jun. 2021, doi: 10.1007/s10956-020-09875-z.
[14] X. Zhai, J. Krajcik, and J. W. Pellegrino, “On the validity of machine learning-based Next Generation Science assessments: A validity inferential network,” J. Sci. Educ. Technol., vol. 30, no. 2, pp. 298–312, Apr. 2021, doi: 10.1007/s10956-020-09879-9.
[15] K. C. Haudek and X. Zhai, “Examining the effect of assessment construct characteristics on machine learning scoring of scientific argumentation,” Int. J. Artif. Intell. Educ., Dec. 2023, doi: 10.1007/s40593-023-00385-8.
[16] S. Maestrales, X. Zhai, I. Touitou, Q. Baker, B. Schneider, and J. Krajcik, “Using machine learning to score multi-dimensional assessments of chemistry and physics,” J. Sci. Educ. Technol., vol. 30, no. 2, pp. 239–254, 2021.
[17] S. Hilbert et al., “Machine learning for the educational sciences,” Rev. Educ., vol. 9, no. 3, p. e3310, 2021, doi: 10.1002/rev3.3310.
[18] P. P. Martin, D. Kranz, P. Wulff, and N. Graulich, “Exploring new depths: Applying machine learning for the analysis of student argumentation in chemistry,” J. Res. Sci. Teach., vol. n/a, no. n/a, doi: 10.1002/tea.21903.
[19] B. J. Yik, A. J. Dood, D. C. R. de Arellano, K. B. Fields, and J. R. Raker, “Development of a machine learning-based tool to evaluate correct Lewis acid–base model use in written responses to open-ended formative assessment items,” Chem. Educ. Res. Pract., vol. 22, no. 4, pp. 866–885, 2021.
[20] A. Vaswani et al., “Attention is All you Need,” in Advances in Neural Information Processing Systems, Curran Associates, Inc., 2017. Accessed: Aug. 09, 2024. [Online]. Available: https://proceedings.neurips.cc/paper_files/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html
[21] C. Raffel et al., “Exploring the limits of transfer learning with a unified Text-to-Text Transformer,” Jul. 28, 2020, arXiv: arXiv:1910.10683. Accessed: Apr. 03, 2023. [Online]. Available: http://arxiv.org/abs/1910.10683
[22] T. B. Brown et al., “Language models are few-shot learners,” Jul. 22, 2020, arXiv: arXiv:2005.14165. Accessed: Apr. 03, 2023. [Online]. Available: http://arxiv.org/abs/2005.14165
[23] OpenAI et al., “GPT-4 Technical Report,” Mar. 04, 2024, arXiv: arXiv:2303.08774. doi: 10.48550/arXiv.2303.08774.
[24] A. Q. Jiang et al., “Mixtral of Experts,” Jan. 08, 2024, arXiv: arXiv:2401.04088. doi: 10.48550/arXiv.2401.04088.
[25] “AI Coding powered by OpenAI,” ATLAS.ti. [Online]. Available: https://atlasti.com/ai-coding-powered-by-openai

Authors
  1. Namrata Shivagunde University of Massachusetts Lowell
  2. Anna Rumshisky University of Massachusetts Lowell
  3. Dr. Milo Koretsky Tufts University [biography]
Note

The full paper will be available to logged in and registered conference attendees once the conference starts on June 22, 2025, and to all visitors after the conference ends on June 25, 2025