This is an abstract for a work-in-progress paper in the methods/theory category of the ASEE Educational Research and Methods (ERM) Division.
Undergraduate engineering courses at a large Canadian university, especially those taught in first-year, use discussion boards to facilitate and answer student questions. Literature shows that the intelligent categorization of these questions can reveal the learning patterns of a student body and where it may be struggling. However, the practical implementation of this is difficult due to the time and effort required to examine the hundreds of student questions that can be asked throughout a term.
The motivation behind this paper is to explore how the analysis of student questions can be automated. This would provide instructors with new, potentially novel insight from discussion boards that they can use to improve their courses and their pedagogy. The efforts of automation will be focused on the task of text categorization, as literature shows that the most prevalent method to analyze student questions is to categorize them (Chin, 2008).
More specifically, this paper evaluates the viability of using large language models (LLMs) to efficiently categorize student questions via a question taxonomy developed by Saryn Goldberg et al. (2021) to help improve engineering courses. This taxonomy classifies questions into six categories: (i) unspecific questions, (ii) questions about definition, (iii) questions about doing something, (iv) questions about doing something if the problem’s conditions have changed, (v) questions about understanding how or why something happens, and (vi) questions that try to extend knowledge to circumstances beyond what has previously been covered.
Open-source LLMs are used to categorize questions from the discussion board using a multistage process. First, the entries containing questions are isolated. These entries are then paraphrased as single sentence questions, with entries having multiple questions being split up. These paraphrased entries are then classified into the prescribed categories defined by descriptions and examples provided in literature and generated by the authors. Various non-conversational and conversational LLMs are evaluated on each task. For the paraphrasing task, this is determined holistically by the authors. For all the other tasks, performance is determined by how closely the LLM’s decisions align with those of the authors, who are subject-matter experts for the courses being analyzed. Combinations of the best-performing models for each task are then assembled into candidate LLM pipelines. Finally, the question categorization of the LLM pipelines is then compared to the categorization done by the authors.
The categorization done by LLM pipelines is determined to have comparable accuracy to classification methods used in literature where individuals were asked to categorize student questions from courses that they do not teach (Goldberg, 2021).
In conclusion, this study shows that while LLMs may not be able to match the manual analyses of student questions done by a course instructor, they can still be used as efficient tools to leverage this normally infeasible resource in informing one’s engineering pedagogy.
The full paper will be available to logged in and registered conference attendees once the conference starts on June 22, 2025, and to all visitors after the conference ends on June 25, 2025