2025 ASEE Annual Conference & Exposition

Study on the Use of Random Forest Classifier model and Multi Output Classifier model for Predicting Student Academic Performance and Identifying Area of Concern

Presented at Computers in Education Division (COED) Track 6.A

This paper explores the use of machine learning to identify key factors that may connect to a student's academic performance and how it may be used to predict student learning outcome at an early stage, specifically, by utilizing two machine learning models: the Random Forest classifier and the Multi-Output classifier. The Random Forest Classifier is widely used for classification tasks. It operates by constructing multiple decision trees during training and selecting the mode of their predictions for a given input, identifying the most significant factors affecting outcomes. On the other hand, a Multi-Output Classifier is specifically designed for multi-label or multi-output classification tasks, where each instance can be linked to multiple labels or output variables. It may be used for predicting several target variables simultaneously, for example, assessing a student's grade and engagement level simultaneously, and our Multi-Output classifier uses a neural network backend. In this paper, several datasets sourced from Kaggle containing student background information and academic engagement and performance data were processed using the above two classifier models. The steps for cleaning, preparing and analyzing the data were discussed in this paper. The results show that the Random Forest classifier is very effective in identifying key factors that may connect to a student's academic engagement and performance such as number of units completed in previous semester, grades from previous semester, and tuition fee payment status with an accuracy of 85.9% for the predictions on the test data: 94.5% correctly on the prediction of non-dropouts and 67.9% correctly on the prediction of dropouts. Furthermore, the same set of data was processed by the Multi-Output Classifier neural network resulting in accuracy scores ranging from 83.5% to 94.2% for the five target variables, providing valuable insights to educators for advocating tailored support for at-risk students.

Authors

Mr. Kevin Huang Troy High School [biography]

Kevin Huang is a student at Troy High School in Fullerton CA. As a participant of the Troy Tech program, he has worked as a student intern at several professor's research labs in the College of Engineering and Computer Science at California State University Fullerton to study the use of machine learning for predicting student academic performance and how machine learning may be used to identify at-risk students for an early warning and possible intervention.
Ivan Zimmerman
Dr. Doina Bein http://orcid.org/0000-0002-5072-1979 California State University, Fullerton [biography]

Dr. Bein has an extensive publication record: 13 book chapters, 19 journal articles, and 69 conference papers. Four of her conference papers have received the best paper awards. She was awarded (as PI or co-PI) several research and teaching grants from AF

Note

The full paper will be available to logged in and registered conference attendees once the conference starts on June 22, 2025, and to all visitors after the conference ends on June 25, 2025

« View session

For those interested in:

Broadening Participation in Engineering and Engineering Technology
computer science