This work builds off a collaboration between machine learning and engineering education researchers to build a human-computer partnership utilizing machine learning to analyze student narratives of understanding in written short-answer responses to conceptually challenging questions [1], [2]. This study investigates student thinking in written short-answer responses justifying their answer choice to conceptually challenging multiple-choice concept questions often used in peer instruction or other active learning practices [3], [4]. The study then uses Large Language Models (LLMs) to automate the analysis of these responses.
Students writing short answer responses to justify their reasoning to multiple choice concept questions has been shown to improve engagement, conceptual understanding, and has an overall positive effect on students’ answers [5], [6]. We conceptualize the responses that students write as “narratives of understanding,” as students use a combination of everyday and disciplinary language to construct a short-answer response around their understanding.
We utilized written responses available from consenting students in the Concept Warehouse (CW) [7], a web-based online tool for active learning. Two related questions from chemical engineering thermodynamics were coded by humans using emergent and inductive coding approaches [8], [9]: an enthalpy of mixing question (1396 responses) and an entropy of mixing question (1387 responses).
The written responses were then analyzed using LLMs-based coding methods. We used in-context learning for GPT-4 [10], where we prompted the model with the question, four in-context examples of answers, and the corresponding codes and instructed it to generate the code(s) for the new answer instance. We evaluated the model on a test set of 140 samples for each of the thermodynamics questions. Using both manual and language model-based coding, we aim to answer two research questions:
1. What aspects of student thinking are present in narratives of understanding constructed to justify conceptual questions about the enthalpy and entropy of mixing ideal gases?
2. To what extent can we use Large Language Models to automate qualitative coding of student narratives of understanding?
Through this coding process, elements of student thinking were identified to describe the narratives students created to convey their understanding, including identification, comparison, and inference. In this paper, we conceptualize and discuss these three aspects of student thinking and how they convey an understanding of these concepts. When compared against manually coded responses, GPT-4 can generate codes with an F1 score of 54%, precision of 61%, and recall of 49% for the enthalpy of mixing question. For the entropy of mixing question, the model has an F1 score of 47%, precision of 48%, and recall of 47%. This study aims to contribute to the body of work that investigates applications of natural language processing to education research and, more generally, to qualitative coding processes.
Are you a researcher? Would you like to cite this paper? Visit the ASEE document repository at peer.asee.org for more tools and easy citations.