2025 ASEE Annual Conference & Exposition

Engineering Student Early Dropout Prediction in Regional Universities Using Multimodal AI

Presented at ERM WIP IV: Examining Undergraduate Recruitment & Retention

The overall dropout rate of engineering students is approximately 50% in the United States. However, the severity of dropout significantly varies across universities. Top engineering schools in national universities has over 90% retention rates. Regional universities and campuses have experienced much higher rates of student attrition from engineering programs. Reducing the dropout rate in regional universities becomes the most effective and economical way to increase retention and graduation of engineering for the nation. However, the dropout problem in regional universities has not been systematically studied. Students in regional universities are more diverse in personal background and academic preparation. They often face unique challenges, such as financial constraints, daily commutes, and the need to balance academic pursuits with employment and family responsibilities. Findings based on data from national universities cannot be easily applied to regional universities or campuses.

This study is the first attempt to reduce the dropout rate of engineering students in regional universities using the latest multimodal AI for end-to-end prediction. The machine learning framework was trained by data from five main data sources: 1) high school information, 2) demographic information, 3) college and department program information, 4) academic information, course study and research activities, and 5) student real time feedback to the web, and course learning management systems. The first three categories have stable and long-term effects on the decision of dropout. The data in categories 4 and 5 have intermediate to high variations. The combination of all above data will include long-term to short-term influences on dropout decisions in a static, dynamic, and cumulative manner.

The data were preprocessed and split into training, validation and testing datasets for machine learning. Text and images were processed by ChatGPT 4o. The features returned from ChatGPT were integrated into the local network consisting with LightGBM and XGBoost for risk prediction. The predicted risk of a student dropping out from an engineering program is a probability between 100% for graduation and 0% for dropout. The completion of applied credits for a degree program is linearly scaled between 0 and 1 as targets for supervised learning. A new graduation probability is assigned to the student until that student either graduates or drops out.

This study takes advantages of the latest progresses in AI. First, it allows free-form text and images as input. It also reduced the workload and potential bias in survey design. Second, modern AI engines like XGBoost and LightGBM are able to find more complex and deeper relationships in data. They are more tolerant in imbalanced data and missing data, and more resistant to overfitting and can handle categorical data with state-of-the-art performance in data analysis. The support of Shapley additive explanations makes it easier to explain individual predictions, which is essential for mitigating potential biases caused by machine learning.

This research study was approved by the IRB of the university. (The name is not revealed to comply with the double-blinded review guidelines).

Authors
  1. Dr. Bin Chen Purdue Univeristy Fort Wayne
  2. Irah Modry-Caron Purdue University Fort Wayne
Note

The full paper will be available to logged in and registered conference attendees once the conference starts on June 22, 2025, and to all visitors after the conference ends on June 25, 2025