This research brief explores a multimodal approach to investigating student engagement in fully online, asynchronous engineering courses. The study integrates student self-reported emotions, instructor observations, and AI-assisted facial expression analysis to examine affective and attentional signals that may support a richer understanding of engagement during short problem-solving tasks embedded in online course modules. The proposed method is designed for online asynchronous learning environments, where instructors have limited access to the real-time behavioral cues that often inform support in face-to-face classrooms. To support preliminary development of the AI workflow, this study uses the publicly available DIPSER dataset to examine the extent to which a vision-language model can infer student emotion and attention from cropped facial images and head-pose information. Results suggest that model predictions are more plausible when facial cues are visually clear, but less reliable when expressions are subtle or ambiguous, underscoring the limitations of single-frame analysis. These findings support the feasibility of the proposed approach while motivating future research in online asynchronous engineering courses to examine how triangulated engagement data can inform instructional design and student support.
http://orcid.org/0000-0002-0084-951X
Embry-Riddle Aeronautical University
[biography]
http://orcid.org/0000-0002-1047-7617
Embry-Riddle Aeronautical University
[biography]
The full paper will be available to logged in and registered conference attendees once the conference starts on June 21, 2026, and to all visitors after the conference ends on June 24, 2026