This work-in-progress examines how AI-supported multimodal instruction, aligned with the Common European Framework of Reference for Languages (CEFR), influences linguistic development and learner engagement in English-Medium Instruction (EMI) humanities learning among international engineering students in the College of Engineering at Shibaura Institute of Technology (SIT) in Japan. Two cohorts of students enrolled in a 14-week “History of Japan” module within an undergraduate engineering curriculum were compared: a pre-AI baseline cohort (Fall 2022, n = 22) and an AI-supported cohort (Spring 2025, n = 18). The 2022 cohort completed weekly reflective essays without AI assistance, while the 2025 cohort engaged with level-stratified multimodal resources developed using Claude 3.5 Sonnet and ElevenLabs. These multimodal resources consisted of highly detailed C2-level lecture slides paired with B2-level narrated summaries designed for accessibility; this combination was designed to reduce extraneous cognitive load while preserving conceptual challenge. English-language lexical development among the students was measured using the CEFR-based Vocabulary Level Analyzer (CVLA) v3.0, intercultural sensitivity was measured using the Miville-Guzman Universality-Diversity Scale–Short Form (MGUDS-S), and learner perceptions were recorded through surveys carried out in Week 14. CVLA results indicated significant lexical improvement in both cohorts (Fall 2022: t = −2.244, p = .036; Spring 2025: t(17) = 2.72, p = .015, Cohen’s d = 0.64), with reduced post-test variance in the AI-supported cohort suggesting convergence toward higher proficiency. MGUDS-S analysis for the 2025 cohort (n = 12) showed a small but not statistically significant change in overall intercultural sensitivity (p = .635), with the MGUDS-S subscale 'Diversity of Contact' showing a medium effect size (Cohen's dz = 0.52). A correlational analysis yielded a positive but non-significant association between lexical development and increasing intercultural sensitivity (r = +0.485, p = .110). Survey responses (n = 15) indicate that learners generally experienced the multimodal scaffolding as supportive, with notably higher ratings for visual slides than for audio-only narrations, and a majority preference for the instructor’s voice over an AI-generated one. This study offers empirically informed design implications for ethically integrating AI as strategic scaffolding in EMI humanities learning within engineering education.
The full paper will be available to logged in and registered conference attendees once the conference starts on June 21, 2026, and to all visitors after the conference ends on June 24, 2026