This work-in-progress methods paper explores the viability of low-cost open-source LLM technologies for qualitative content analysis research in the domain of curricular development. This work was initially part of one author’s dissertation research where large volumes of curricular data for mechanical, electrical, and civil engineering needed to be systematically assessed and qualitatively coded through a content analysis approach. The scope of this work is large, and expansion of this work in the future required a novel approach to examining this data systematically and reliably as the dataset expands. Resource limitations, primarily in finances and available personnel, are a driving motivation for pursuit of this work. The need for a reliable system at a low cost culminates in this work. Advances in qualitative methods where LLMs are leveraged as a tool for assisted qualitative coding suggest the practice has merit, however, explorations into the use of small locally hosted models is relatively unexplored, especially in the content analysis context.
The implementation of this LLM system centers around the data being analyzed. The curricular data spans full programs for multiple universities and disciplines, resulting in thousands of entries of courses suggested for engineering students to enroll towards their degrees. These courses had previously been assessed and a codebook was developed to categorize these courses for the dissertation work of one author. This data and subsequent codebook serve as the “training and testing” data for this study. The low-cost open source LLM selected to power this system is Qwen 2.5-7B.
Current research in this space suggests LLMs struggle with intricate nuance within codebooks designed by and made for human researchers. This system was instructed to iteratively adapt the codebook into a version that is more readily suited for LLM usage. In each iteration, the system refines using the training data, validates itself with the testing data, and documents the results thoroughly to comply with auditing procedures and best practice in qualitative research. This process of partial data analysis is inspired by the practice of qualitative researchers examining a portion of their data prior to convergence. Preliminary results suggest good alignment with narrowly defined code descriptions, while more ambiguous codes are found to continually diverge over time. Future work will include an expanded data set application, testing on other data sets, and a review of advanced LLMs for best fit. The application of this system to a dynamic data set is intended to offer a reliable automated solution for evaluating engineering curricula at a scale that enables administrators and curriculum developers to assess their own programs against a standardized understanding of engineering curricula.
The full paper will be available to logged in and registered conference attendees once the conference starts on June 21, 2026, and to all visitors after the conference ends on June 24, 2026