Identifying struggling students has long been a key objective for educators and institutions. It allows for timely interventions that can improve student retention and graduation rates—two critical components of most institutional missions. This paper reports on our experimentation with developing and training machine learning models to identify struggling students using Learning Management System (LMS) data. Our findings indicate that while these models do not perform as well when making predictions over long-term periods, such as an entire semester, they perform significantly better over shorter timespans.
The focus on LMS data stems from its ubiquity and the fact that instructors using LMS platforms for instruction have direct access to the collected data without needing to submit additional applications or navigate multiple levels of approval to acquire such access. However, this task is challenging for several reasons. First, course-related data is inherently imbalanced. Second, many factors can adversely affect student performance, and these can arise at any point during the semester. Most of these factors lie outside the LMS and, therefore, are not captured by its data. Finally, it often takes time for signs of student struggles to become evident in the data.
This paper proposes a process through which struggling students within courses can be accurately and regularly identified. The process is evaluated using data spanning a three-year period from a public, four-year university. The data includes 274 lower-division Computer Science courses delivered in various formats (face-to-face, online, and virtual), involving 2,656 students and 37 instructors.
The full paper will be available to logged in and registered conference attendees once the conference starts on June 22, 2025, and to all visitors after the conference ends on June 25, 2025