A significant gap in education lies in the need for mechanisms that enable early
detection of potentially at-risk students. Through access to an earlier prediction of
student performance, instructors are given ample time to meet with and assist
under-achieving students. As with any prediction modeling problem, there are many
predictors to choose from when formulating a model. Previous related works have
shown limited success in predicting course performance using students’ personal and
socioeconomic traits. Students learn by asking clarifying questions. Therefore,
discussion boards have been a staple of learning at the university level for years. This
paper aims to utilize participation in discussion forums to predict final student
performance. Using students’ course grades at roughly the halfway point in the term
and various discussion forum predictors, our model predicts the students’ final
percentage score. Using the model’s prediction, instructors can speak with at-risk
students and discuss ways to improve. The student grades and discussion board
participation datasets are gathered from a graduate-level Electrical and Computer
Engineering (ECE) course at Duke University. Various classical machine learning
models are explored, with random forest yielding the highest accuracy. This random
forest model, trained on discussion forum participation data, surpasses other similarly
trained state-of-the-art models. Furthermore, related research attempts the
classification problem of predicting what discrete letter grade a student will earn.
This is not an accurate representation of a student’s performance, and therefore, we
attempt the regression problem of predicting the exact percentage a student will earn.
A significant finding of this paper is that our random forest model can predict student
performance with an average error of approximately 2.3%. Additionally, our random
forest model can generalize to a different graduate-level course and make
performance predictions with an average error of 3.3%. The final important finding is
that a model including discussion board predictors outperforms another whose sole
predictor is the students’ halfway point grade. This indicates that discussion forums
hold significant value in determining final performance. We envision that the
knowledge from our findings and our optimal random forest model can enable
instructors to identify and support potentially at-risk students preemptively.
Are you a researcher? Would you like to cite this paper? Visit the ASEE document repository at peer.asee.org for more tools and easy citations.